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STUDIES IN THE HISTORY OF PROBABILITY 
AND STATISTICS 
Il. THE BEGINNINGS OF A PROBABILITY CALCULUS 


By M. G. KENDALL 


Research Techniques Unit, London School of Economics and Political Science 


1. The first article in this series by Dr F. N. David (1955) has reviewed the development 
of dicing and gaming up to the time of Fermat and Pascal, who are popularly but 
erroneously supposed to have founded the calculus of probability. In this paper I shall 
try to trace the evolution of the idea of a probability calculus with especial reference to 
what Dr David calls the tantalizing period prior to a.p. 1600. 


2. During the dark ages gaming was prevalent throughout Europe. At some unknown 
point of time dice finally ousted tali as instruments of play; and since cards were not 
introduced until about a.p. 1350 gaming must have been conducted mainly with dice for 
nearly a thousand years. Efforts on the part of Church and State to control the evils 
associated with it were as ineffectual then as they are today, and nothing is more indicative 
of the persistence of gambling than the continual attempts made to prevent it. The sermon 
of St Cyprian of Carthage De Aleatoribus (c. A.D. 240) was echoed twelve hundred years 
later in the more famous sermon of St Bernardino of Siena Contra Alearum Ludos of 
A.D. 1423. The gambling of the Germans referred to by Tacitus may perhaps have become 
more moderate but was equally prevalent in the thirteenth century when we find Friedrich IT 
(1232) issuing a law de aleatoribus and Louis IX (1255) forbidding not only the play but 
even the manufacture of dice. A long series of edicts prohibiting the clergy from gaming 
(e.g. by Otto der Grosse, A.D. 952, the Councils of Tréves in A.D. 1227 and 12%>, the Council 
of Worcester in A.D. 1240) are themselves eloquent of the failure on the part of the authorities 
to repress the evil. 

3. We must remember, however, that all these banns and prohibitions were not really 
directed against games of chance as such but against the vices which accompanied them. 
There seems to have been nothing impious in creating a chance event or in using it for 
purposes of amusement. The Church was much more concerned about the drinking and 
swearing which accompanied gaming ; and the State was more concerned xbout the idleness, 
thriftlessness and crime which were so often found among gamblers. Chaucer’s Pardoner 
puts the official view of his day by giving an example of the blasphemy which usually 
accompanied a gambling game (almost certainly of hazard)* 


By Goddés precious heart and by his nails 
And by the blood of Christ that is in Hayles 
Seven is my chance, and thine is cinq and trey. 
By Goddés armés, if thou falsely play 
This dagger shall throughout thine herté go !— 
This fruit cometh of the bitchéd bonés two: 
Forswearing, iré, falseness, homicide. 
Even chess, the most innocent of all games, was classed among the major vices, at least 
for officials. The interdict of Louis IX referred to above says: ‘They shall abstain. . .from 
* Here and elsewhere I have modernized the spelling of the English quotations to some extent where 
the metre permits. 
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2 Studies in the history of probability and statistics. II 


dice and chess, from fornication and frequenting taverns. Gaming-houses and the manu- 
facture of dice are prohibited throughout the realm.’* 


4. San Bernardino enumerates at length fifteen ‘malignitates impiisimi ludi’ but they 
are all moral evils (love of gain, idleness, corruption of youth, etc.) with the exception of 
blasphemy and contempt of the prohibitions of the Church. One feels that if there had 
been anything to say about the impiety of eliciting chance events for innocent entertain- 
ment the saint would certainly have said it. The general attitude of his time seems to have 
been one of toleration of the actual play but stern opposition to its associated vices. There 
is some positive evidence to the same effect. About a.p. 960 a certain bishop Wibold of 
Cambray invented a clerical version of dice to which I shall refer later; this sagacious 
realist evidently recognized the impossibility of stamping out the evil and hence attempted 
to turn it into good. Participants in the third crusade (a.p. 1190) had, in their briefing 
instructions, a carefully drawn up statement of the extent to which they might gamble; 
no person below the rank of knight was permitted to play at all for money; knights and 
the clergy might play but could not lose more than twenty shillings in twenty-four hours. 
Chaucer, in the Franklin’s Tale, refers to the playing of chess and tables (backgammon) 


with the laudable object of distracting the heartbroken Dorigene. In 1484 Margery writes 
to John Paston (24 December): 


Please it you to wit that I sent your eldest son to my Lady Morley to have knowledge what sports 
were used in her house in Christmas next following after the decease of my lord her husband; and she 
said that there were none disguisings, nor harping, nor luting, nor singing nor none loud disputes; but 


playing at the tables (backgammon) and chess and cards; such disports she gave her folks leave to 
play and none other. 


5. We may also notice a series of laws, beginning in the reign of Edward III, prohibiting 
the playing of certain games in order to promote manly sports. An act of Henry VIII 
added dice and cards to the list of unlawful amusements, although Henry, like many other 
monarchs, set a very bad example to his subjects. These laws were militaristic in origin. 
The common people were not to waste their leisure in playing peaceful games like bowls, 
ninepins, hockey and dice ; their duty was to practice archery in readiness for the next war. 


6. I recall these facts to establish two points. The first is that playing with dice (and 
later with cards) continued from Roman times to the Renaissance without interruption 
and was practised not only among the educated classes but among the middle classes and 
among the lower classes also. The second is that although the various Governments and 
the Church discouraged gaming to the point of prohibition, a great deal of play went on 
either as innocent pastime or, by popular approval, in defiance of the law. 


7. One of the exasperating features of the many references to dice-playing between 
A.D. 1000 and 1500 is that authors invariably assume that their readers are familiar with 
the games they mention; and hence no rules of play are offered. We are thus very much in 
the dark about the exact nature of the games which were played. There are two in particular 
which have a long and interesting history: hazard, the ancestor of the modern American 


crap game, and primero, the ancestor of poker. It is instructive to consider briefly their 
line of development. 


kg ‘Abstineant...a ludo etiam cum taxillis vel aleis vel scacis, et a fornicatione et tabernis. Scolas 
etiam deciorum prohibemus omnino....Fabrica vero deciorum prohibeatur ubique in nostro regno.’ 
Paxillus is a diminutive of talus, but I do not know whether in this context it refers to the talus or the die. 
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8. The Romans played with four tali (huckelbones) but with only three tesserae (dice). 
At some early stage versions of dice-playing with only two dice are mentioned ; for example, 
Bishop Eustathius, in a commentary on the Odyssey written in a.p. 1180, refers to games 
with two dice. The Chaucerian extract quoted above also mentioned two dice. In 1707 
Montmort wrote of ‘le quinquenove, le jeu de trois dez et le jeu du hazard. Les deux 
premiers sont les seules jeux de dez qui soient en usage en France, le dernier n’est commun 
qu’en Angleterre.’ Both quinquenove and hazard were played with two dice; all three 
games are variants of the same idea. 


9. Hazard, the game as distinct from its modern meaning of chance in general, was, 
I believe, brought back to Europe by the Third Crusaders. Godfrey of Bouillon gives 
a false derivation: ‘A Hazait [Hazar] s’en ala ung riche mandement, et l’apiel-on Hazait 
pour le fait proprement que ly dés fu fais et poins premierment.’ There can be little doubt 
that the word derives from the Arabic al zhar, meaning a die.* Wherever it came from, the 
name and the game must have spread rapidly through Europe. Jean Bodel’s play Le Jeu 
de Saint Nicolas, ascribed to the year A.D. 1200, refers to hazart. Salimbene (the son of 
a crusader), writing about 1287, refers to playing ‘ad azardum alias ad taxillum’. Dante’s 
Purgatorio, written between A.D. 1302 and 1321, refers to azar ; and Chaucer (about a.p. 1375) 
uses the word several times. The exact rules of play are not, so far as I am aware, on record. 
There were doubtless many variants. But the quotation from Chaucer given above suggests 
that the essential features of the modern game of craps were present at an early date; 
the addition of the numbers on two dice and the ‘chances’ of each player are clearly 
indicated.} 


10. It might have been supposed that during the several thousand years of dice playing 
preceding, say, the year A.D. 1400, some idea of the permanence of statistical ratios and 
the rudiments of a frequency theory of probability would have appeared. I know of no 
evidence to suggest that this was so. Up to the fifteenth century we find few traces of 
a probability calculus and, indeed, little to suggest the emergence of the idea that a calculus 
of dice-falls was possible. It may be that gamblers had a rough idea of relative frequencies 
of occurrence—it is hard to see how they could fail to acquire such a thing; and as there is 
some evidence of the manufacture of false dice from Roman times onwards there was 
presumably a complementary notion of fair throwing. It may also be that some intelligent 
man worked out the elements of a theory for himself but guarded his secret on account of 
its cash value. But I do not really believe this. Other people tried to do the same thing 
later, but not with permanent success. 


11. The earliest work I know which would seem to have mentioned the number of ways 
in which dice can fall is the game invented by Wibold, referred to above. So far as I am 
aware no contemporary manuscript has survived but an account of the game (a very 
obscure one, incidentally) was given by the chronicler Baldericus in the eleventh century, 
the work being first published in 1615. Wibold enumerated 56 virtues—one corresponding 


* Libri (1838-41, vol. 2, p. 188) also gives a false derivation: ‘Ce mot vient d’asar, qui en arabe 
signifie difficile’, the difficulty in question being that of obtaining two aces or two sixes. Libri 
undoubtedly got this from the Dante Commentary mentioned in section 14 below. 

+ The ‘chance’ at hazard is not a probabilistic one. A player either calls or throws a ‘main’ (e.g. in 
one version, any number from 5 to 9 inclusive). He then throws again and may (a) ‘throw out’, in 
which case the dice pass to his opponent; (b) win outright by throwing a certain score; (c) throw 
another score which becomes his ‘chance’. He then goes on throwing until either the ‘main’ or his 
‘chance’ turns up, losing in the first and winning in the second case. 
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to each of the ways in which three dice can be thrown, irrespective of order. Apparently 
a monk threw a die three times, or threw three dice, and hence chose a virtue which he was 
to practice during the next 24 hours. It does not sound much of a game, but perhaps 
I have misunderstood Baldericus’s account. The important point is that the partitional 
falls of dice were correctly counted. There was no attempt at assessing relative probabilities. 


12. The use of dice for the purpose of choosing among a number of possibilities may well 
be much older than Wibold and certainly continued for long after his time. There exist 
several medieval poems in English, setting out the interpretations to be placed on the 
throws of three dice. The best-known is the Chaunce of the Dyse which is in rhyme royal; 
one verse for each of the 56 possible throws of three dice. For example, the throw 6, 5, 3 
gives Mercury that disposed eloquence 
Unto your birth so highly was incline 
That he gave you great part of science 
Passing all folkés heartés to undermine 
And other matters as well define 
Thus you govern your wordés in best wise 
That heart may think or any tongue suffise. 

Another incomplete poem in the Sloane manuscripts also deals with the throws syste- 
matically but in a different manner; e.g. for 6, 5, 3, 

Thou that has six, five and three 

Thy desire to thy purpose may brought be 

If desire be to thee y-thyght 

Keep thee from villainy day and night. 
Poems like these were doubtless used for elementary fortune-telling—one threw the dice 
to pick the phrase peculiar to oneself. Those mentioned come from what are probably 
early fifteenth-century manuscripts and have a certain interest in connexion with 
divination probabilities. For my present purposes the point to be noticed is that, for 
purely astragalomantic reasons, the different possible throws were enumerated and known 
without any reference to gaming or a probabilistic basis. 


13. A similar idea is expressed in San Bernardino’s sermon of a.D. 1423. The Saint makes 
a very detailed comparison of the Church of Christ and the church of Satan, represented 
in this instance by gaming. The Church corresponds to the gaming house; the altar to the 
gaming table; the sacrificial vessels to the dice box; and so on. In the middle of all this 
nonsense occurs a passage which does a little to compensate us for having to read it. ‘The 
missal I compare to the die; for in flexibility, permanence and scope it is in no way inferior 
to the missal of Christ himself; and just as that missal is composed of a single alphabet of 
twenty-one letters, so in the [game of] dice there are twenty-one throws.’* 

The twenty-one possible throws are undoubtedly those with two dice. This number is 
correct on the interpretation of the indistinguishability of partitions. One cannot help but 
admire the twisted ingenuity of the comparison, or speculate on what the Saint would have 
said had he been dealing with the gospel in Greek or Aramaic. 


14. The earliest approach to the counting of the number of ways in which three dice 
can fall (permutations included) appears to occur in a Latin poem De Vetula. This remark- 
able work was regarded as Ovid’s for some time and is included among certain medieval 

* ‘Missale vero taxillum, esse volo: qui quidem, et tractabilior, et durabilior, atque continentia non 


erit minor, quam sit missale ipsius Christi, cum in eius missali solum alphabetum, hoc est viginti una 
literae comprehendantur, ac totidem puncta in decio concludantur.’ 
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editions of his poems. It is, however, supposititious and several candidates have been 
proposed for authorship. The one generally preferred is Richard de Fournival (1200-50), 
a gifted humanist of the Middle Ages and Chancellor of the cathedral of Amiens. If this 
is correct the poem was presumably written between A.D. 1220 and 1250. It contains a long 
passage dealing with sports and games, and with dicing in particular.* It is, perhaps, worth 
giving in full what is (if genuine) the first known calculation of the number of ways of 
throwing three dice. The text (taken from an edition published at Wolfenbiittel in 1662) 
is given in an appendix (pp. 13-14 below). 
The relevant passage may be briefly and freely construed as follows: 


If all three numbers are alike there are six possibilities; if two are alike and the other different there 
are 30 cases, because the pair can be chosen in six ways and the other in five; and if all three are 
different there are 20 ways, because 30 times 4 is 120 but each possibility arises in 6 ways. There are 
56 possibilities. 

But if all three are alike there is only one way for each number; if two are alike and one different 
there are three ways; and if all are different there are six ways. The accompanying figure shows the 
various ways. 


[It follows, but is not stated, that the total number of ways is 
(6 x 1)+(30 x 3)+ (20 x 6) = 216.] 

15. The accompanying figures are shown in Plates 1 & 2, taken from Harleian MS. 5263. 
The figure referred to above is also given in my Fig. 1, taken from the printed edition published 
at Wolfenbiittel in 1662. The total of the last column is 108 which, doubled, gives us the 
total number of ways of throwing three dice. If this is a thirteenth-century product (the 
manuscript seems to be fourteenth century) it is astonishingly in advance of its time; and 
some of the phrases have a very modern ring (e.g. ‘tria schemata surgunt’, three cases 
arise; ‘quemlibet cum dederis, reliqui duo permutant loca’, if you fix one, the others 
permute in two ways). 


16. There exists a medieval translation into French of the De Vetula, edited and published 
in 1862 by H. Cocheris, who is mainly responsible for the theory that de Fournival was the 
author. The translation takes considerable liberties with the original text and is not always 
easily matched against it. This poem is attributed to the fourteenth century. So far as 
I can see, the translator seems to have failed to understand the main point. He merely 
enumerates the 16 possible scores with three dice and points out that some of them occur 
more often than others. The essential step in the De Vetula has been lost. 


17. In the sixth canto of the Purgatorio Dante mentions the game of hazard: 


Quando si parte il giuoco della zara 
Colui che perde si riman dolente 
Ripetendo le volte e tristo impara: 


(When a game of hazard breaks up the loser remains behind mournfully recalling the throws and 
learning by sad experience.) 


A commentary on this passage published in 1477 says 


Concerning these throws it is to be observed that the dice are square and every face turns up, so 
that a number which can appear in more ways [sc. as the sum of points on three dice] must occur 
more frequently, as in the following example: with three dice, three is the smallest number which can 
be thrown, and that only when three aces turn up; four can only happen in one way, namely as two 
and two aces. 


* T am not competent to express an opinion about the attribution of the De Vetula to de Fournival, 
but I have considerable doubts on the point if the later printed versions correctly record what the 
author wrote about dice. Some of the critical passages may, however, be interpolations by later hands. 
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At this point the author seems to be on the verge of the usual fallacy that a three will 
occur equally as frequently as a four. But (more by luck than knowledge, in my view) he 
veers off this point and proceeds: ‘and so, as these numbers can only happen in one way 
at each throw, in order to avoid tedium and too long a wait, they are not reckoned in the 
game and are called hazards. And so for 17 and 18. . .. The numbers in between can happen 
in more ways; the number which can happen in most ways is said to be the best throw 
of the set.’ 

8 :0'5o> 19 
Quinguaginta modis & fex diverfificantur 
In punGaturis, pun@aturague ducentis 
Atque bis octo cadendi fchematibus, quibusinter 
Compofitos numeros, quibus eft luforibus ulus, 
Divifis, proutinter eos funt diftribuendi, 
Plene cognosces, quantz virtutis corum 
Quilibet esfe poteft, feu quantz debilitatis: 
Quod fubfcripta poteft tibi declarare figura. 
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18. From these passages it seems clear that by the end of the fifteenth century the 
foundations of a doctrine of chance was being laid. The necessary conceptualization of the 
perfect die and the equal frequency of occurrence of each face are explicit. The idea of 
attaching binomial coefficients to the possibilities with two or more dice in order to 
calculate their relative frequency of occurrence had occurred in the De Vetula but seems 
to have been lost to sight. Not until 1556 did Tartaglia publish the scheme now known 
(very unjustly) as Pascal’s arithmetic triangle, and then not in a probabilistic context. 
Nevertheless, if it is the first step which counts, that step had already been made by 
A.D. 1500. 
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19. Although the pioneer step in the De Vetula seems to have been overlooked by later 
writers in the next two centuries, the idea of enumerating the ways of obtaining given 
scores when permutations were taken into account must have been rediscovered by the 
beginning of the sixteenth century; for Cardano’s De Ludo Aleae contains the essential 
ideas and is dated on internal evidence as written about 1526.* Oddly enough, however, 
we find the first problems in probability (so far noticed in the records) in quite a different 
context. 


20. Fra Luca dal Borgo, or Paccioli, was an itinerant teacher of mathematics whose 
Summa de Arithmetica, Geometria, Proportioni et Proportionalita, published in 1494, was 
widely studied in Italy. He considers a simple version of what later became known as tiie 
problem of points: A and B, playing at a fair game (not dice, but balla, presumably a ball 
game) agree to continue until one has won six rounds; but the match has to stop when 
A has won five and B has won three. How should the stakes be divided? 


21. Paccioli makes very heavy weather of this, but his solution amounts to saying that 
the stakes should be divided in the proportion 5:3. The error was noted by Tartaglia in 
his monumental General Trattato of 1556 (which date, we may remark, is thirty years after 
Cardano says that he was in possession of the basic principles embodied in the De Ludo 
Aleae). Tartaglia was always glad to point out errors in Paccioli with an acid superiority 
which foreshadows many of the modern writings on probability and statistics. He would 
have been more justified on this occasion if the alternative solution which he propounds 
had been correct, which it is not. He points out that according to Paccioli’s rule, if A had 
won one game and B none, A would take all the stakes, which is obviously unjust. He 
then argues that the difference between A’s score (five) and B’s score (three) being two, 
and this being one-third of the number of games needed to win (six), A should take one 
third of B’s share and the total stake should be divided in the ratio 2:1. Or so I interpret 
his rather prolix discussion. It would appear that if A has 2 and B y games in hand when 
the total number required to win is z, Tartaglia’s rule requires that A takes a proportion 
4+ (a—y)/(2z) of the stake. 


22. Two years after the 7'rattato there appeared a short work by G. F. Peverone, Due 
Brevi e Facili Trattati, il Primo d’ Arithmetica, l Altro di Geometria. In the first of these 
Peverone considers a similar problem, without reference to other writers. A has won 
7 and B 9 games in a match going to 10 games. He gives two examples, which are 
effectively the same, and argues in this way: 


A should put 2 crowns and B 12 crowns [or, equivalently, the stake should be divided in the pro- 
portion 1:6]. For if A, like B, had one game to go each would put two crowns [or divide the stakes 
in equal proportions]. If A had two games to go against B’s one, he should put 6 crowns against B’s 
two, because, by winning two games he would have won four crowns, but with the risk of losing the 
second after winning the first; and with three games to go he should put 12 crowns because the 
difficulty and risk are doubled. 


* Dr David informs me that she is convinced that Cardano obtained the substance of his work on 
gambling from other sources. This would be in accordance with Cardano’s character, for he was not 
an originator in spite of his extensive knowledge and peculiar gifts. On the other hand, it must remain 
a conjecture until those sources can be traced. 

+ ‘Se giuocassero a 1 giuoco, bastarebbero scutti 2; et a due giuochi 6, per che vincendo solo 2 giuochi 
guadagnarebbe scutti 4; ma questo sta con pericolo di perdere il secondo, vinto il primo: perd deve 
guadagnare scutti 6, et a 3 giuochi scutti 12, per che si indoppia la difficolté e pericolo.’ 
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23. I think this must be one of the nearest misses in mathematics. As far as the second 
game the argument is correct. If B has one game to go and is staking two crowns, then 
for A: 

with one game to go he stakes 2 crowns, 
with two games to go he stakes 2+ 4 = 6 crowns, 
with three games to go he stakes 2+ 4+ 8 = 14 crowns, 


and so on. Peverone was perfectly well acquainted with geometrical progressions and uses 
the word progressione in one exposition of his answer to this problem. Having got as far 
as the staking of 6 crowns by A with two games to go, if he had only stuck to his own rule 
and considered the conditional probabilities of gain more closely, he would have solved 


this simple case of the problem of points, in essence, nearly a century before Fermat and 
Pascal. 


24. Students of the modern versions of hazard and primero have been struck by the 
accurate judgement of probabilities which they embody. The chance of success of the first 
player at craps, for example, should be 1/2 and is actually 244/493; the relative values of 
flush and straight at poker are correct although intuitively it is not clear what the order 
should be. It seems, however, that this situation has been reached empirically and not by 
calculation. Cardano fortunately gives us an account of primero as known to him.* From 
the pack of 52 the eights, nines and tens were removed, leaving 40. Four were dealt to 
each player two at a time. The cards had individual values, two counting 12, three 13, 
four 14 and five 15; six counted 18 and seven 21; an ace 16 and court cards 10. There 
were five combinations: 

(a) Numerus (two or three cards of the same suit) ; 

(6) Primero (all cards of different suits) ; 

(c) Supremus (the three cards 7, 6, ace in the same suit); 

(d) Fluxus (four cards of the same suit); 

(e) Chorus (all cards of the same denomination). 


These were valued in that order, a primero beating a numerus and so forth. The categories 
do not overlap and if two players held the same combination the elder hand won, 
irrespective of suit. 

Now the chances of these events, or rather the number of ways in which they can occur 
on random drawing, are 


Chorus 10 
Fluxus 840 
Supremus 120 
Primero 8,990 
Two of a suit 54,000 

Three of a suit 14,280 

Two pairs 12,150 80,430 


90,390 

* In the Middle Ages many games were played in several forms. For example, there were about 
a dozen different versions of chess. Primero as described by Cardano does not seem to have incor- 
porated a draw for new cards. Sir John Harington, in the reign of Elizabeth I, refers to a later version 
callec ‘prime’ which does. Robert Greene, in 1591, makes a character say ‘what will you play at, 
primero, primo visto, sant, one-and-thirty, new cut, or what shall be the game?’ Shakespeare also 
mentions primero. I have seen it stated that the game (more properly primera) was iniported from 
Spain on the occasion of the marriage of Mary Tudor with Philip II. The terms used in it certainly 
suggest a Spanish origin, but I do not see why the marriage of Henry VIII with Catharine of Aragon 
could not have been the occasion, or, indeed, that a specific occasion need be invoked. 
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In other words, the relative value of the fluxus and the supremus in Cardano’s version 
were in the inverse order of their probabilities. At what point the present (correct) order 
in the modern game of poker emerged, I do not know, but it seems to have been before 
anyone was in a position to calculate the chances and persuade his fellows on the basis of 
mathematics that the orders should be reversed. In my opinion relative chances were all 
reached on the basis of intuition or trial and error in the games played up to the middle of 
the seventeenth century. 


25. It seems clear that in fifteenth-century Italy the basic problems of chance in gaming 
had been raised and some small progress made towards solving them. A more thorough 
examination of the Italian mathematical books of the period may reveal further evidence 
on the point. One suspects that some of the simpler problems were circulated as a kind 
of puzzle, just as they are at the present day, without becoming of any recognized scientific 
importance. Galileo in his fragment Sulla Scoperta dei Dadi, written some time before 
1642 (the date of his death), gives a complete solution of a problem in direct probability 
by correct enumeration of all possibilities, and he writes as if the problem were a new one, 
mentioning no previous authors.* Nevertheless, if Cardano’s treatise is to be correctly 
assigned to 1526 the ideas must have been current for a century before Galileo wrote. It 
would appear that a calculus of probability not only was late in developing but that, once 
begun, it progressed exceedingly slowly. 


26. Before we consider the reasons for this, something remains to be said about 
developments in France in the first half of the seventeenth century. The cradle of the 
probability calculus was undoubtedly, in my opinion, in Italy. From the fourteenth 
century, however, there were close connexions between France and Italy of a political as well 
as a geographical kind and an intellectual movement in one often generated a sympathetic 
movement in the other. The invasion of Italy by Charles VIII in 1494, though militarily 
and politically a failure, is generally regarded as a useful piece of intellectual cross- 
fertilization. Undoubtedly, a great many Italian works of art and ideas found their way 
to France with the remnant of Charles’s army, although I doubt whether a copy of Paccioli’s 
book was amongst them. In this case also a search among French books on mathematics 
written between a.D. 1400 and 1650 might prove to be very instructive. 


27. The lack of written references to problems in the probability calculus is not 
necessarily indicative of a lack of contemporary interest. Knowledge of chances was so 
-rudimentary that any capacity to gauge them accurately in play was worth a good deal 
of money. Huyghens, visiting France in 1657, found intense interest being taken in the 
doctrine of chances among mathematicians but encountered also a certain coyness about 
the disclosure of results. This was presumably due to fear of anticipatory publication 
rather than loss of income. Huyghens, being the man he was, merely worked out the theory 
for himself. A Latin translation of his little book, De Ratiociniis in Ludo Aleae, printed by 
van Schooten in 1657, was the first book published on the probability calculus and exercised 
a profound influence on James Bernoulli and Demoivre. 


28. Now we come to the most interesting question of this period. Why was it that the 
calculus of probabilities was so long in emerging? We cannot suppose that the Greeks were 
incapable of making the necessary generalizations, even if they were hampered in working 


* This is not a very weighty consideration. Early writers on probability, like those of the present 
day, often failed to mention their indebtedness to their predecessors. Laplace was notoriously bad at it. 
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out the details by their arithmetic and algebra. The same is true of the Arabs and of the 
early medieval Europeans. Dr David has suggested that imperfections in the dice may 
have something to do with it, but I cannot believe that this was a major reason. Some of 
the dice were, in fact, quite well made. The races which built the Parthenon, Trajan’s 
Column, St Sophia and Notre Dame were quite capable of turning out a few cubes as good 
as any of those in current use. Nor do I think backward mathematical notation had much 
to do with it. We have seen that the partitional falls of dice were counted without difficulty 
in the tenth century. Four other possibilities are worth examining: 


(a) the absence of a combinatorial algebra (or at any rate, of combinatorial ideas) ; 

(b) the superstition of gamblers; 

(c) the absence of a notion of chance events; 

(d) moral or religious barriers to the development of the idea of randomness and chance. 


29. Combinatorial algebra does not seem to have been cultivated by the ancients. 
Interest in it awoke in the sixteenth and seventeenth centuries. Leibniz published a tract 
De Arte Combinatoria in 1660 and Wallis a De Combinationibus Alternationibus et Partibus 
Aliquotis Tractatus in 1685. Doubtless the essential ideas could be traced back a good deal 
earlier.* Thus, when the calculus of probability was really under way, a combinatorial 
algebra lay ready to hand. Nevertheless, it seems to me, the absence of such an algebra 
cannot be held to account for the late emergence of the doctrine of chance. Cardano 
managed well enough without it and Galileo’s enumeration of the 216 ways of throwing 
three dice is perfect, though apparently based only on arithmetical methods. 


30. The superstition of gamblers is well known and has been remarked upon by many 
early writers. If men were logical and observant beings, one would have deemed it impossible 
for any person to engage very much at play and, at the same time, to believe that the 
favours of fortune were distributed unequally in the long run. But it seems quite possible 
for a player to believe two incompatible propositions; and with sufficient ingenuity, 
I suppose, it is also possible to reconcile a belief in the law of large numbers with a belief 
that the luck will change if one takes a different chair. One can say much on this subject. 
I am content to record the opinion that, although the psychology of the gambler may have 
done something to hinder the development of a concept of probabilistic law, it cannot have 
prevented the leading minds of the age from arriving at such a concept. 


31. Lf we discount such factors as ill-made dice, indifferent mathematical expertise, 
superstition and so forth; and if we agree that play with dice and cards were so prevalent 
as to arouse general interest among intelligent people; then we seem driven to the con- 
clusion that the late emergence of the probability calculus was due to some more funda- 
mental factor. The very notion of chance itself, the idea of natural law, the possibility 
that a proposition may be true and false in fixed relative proportions, all such concepts 
are nowadays so much part of our common routines of thought that perhaps we forget 
that they were not so to our ancestors. It is in basic attitudes towards the phenomenal 
world, in religious and moral teachings and barriers, that I incline to seek for an explana- 
tion of the delay. Mathematics never leads thought, but only expresses it. 


* The origins of combinatorial algebra would themselves make an interesting historical study. 
Wallis, at the age of 25, established a reputation for himself by deciphering Royalist letters intercepted 
during the Civil War. Bacon, in the reign of Elizabeth I, also took a keen interest in cryptography. 
Both men used ciphers based on combinations of symbols. 
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32. The Greeks and the Romans (so far as one can make summary statements about 
races whose members held such differing views) seem, on the whole, to have regarded the 


_ world as partly determined by chance. Gods and goddesses had influence over the course 


of events and, in particular, could interfere with the throwing of dice; but they were only 
higher beings with superhuman powers, not omnipotent entities who controlled every- 
thing. And the vaguer deities, Fortuna, the Fates and Fate itself appear to modern eyes 
more in the retributive role of a personified guilty conscience than as masters of the 
universe. The situation was radically changed by Christianity. For the early fathers of 
the Church the finger of God was everywhere. Some causes were overt and some were 
hidden, but nothing happened without cause. In that sense nothing was random and 
there was no chance. ‘Nos eas causas’, says St Augustine, ‘quae dicuntur fortuitae (unde 
etiam fortuna nomen accepit) non dicimus nullas, sed latentes; easque tribuimus vel veri 
Dei, vel quorumlibet spirituum voluntati.’ This view prevailed also in medieval times. 
Thomas Aquinas, arguing that everything is subject to the providence of God, mentions 
explicitly the objection that, if such were the case, hazard and luck would disappear. He 
replies that there are universal and particular causes; a thing can escape the order of 
a particular cause but not of a universal cause; and so far as it escapes it is said to be 
fortuitous with respect to that cause. St Thomas has an Aristotelean view of primary and 
secondary causes but we need not follow closely his struggles with the problems of causality, 
predestination and free-will. He reflected the spirit of his age, wherein God and an 
elaborate hierarchy of His ministers controlled and fore-ordained the minutest happening ; 
if anything seemed to be due to chance that was our ignorance, not the nature of things.* 


33. St Thomas is sometimes quoted as having expressed himself in favour of a frequency 
theory of probability, but, in my opinion, this rests on a source of confusion which it may 
be useful to remove in passing. Throughout this article I have been speaking of the doctrine 
of chances (which Demoivre translated as Mensura Sortis), not probability in the wider 
sense. Early writers used probabilitas with a different meaning, as relating to the degree 
of doubt with which a proposition is entertained. At the outset of our science the two 
things were distinct and it is a pity that they have not remained so and that our language 
has tended to confuse them. It seems to have been James Bernoulli who first thought of 
applying the doctrine of chances to the art of conjecture; and although we find applications 
to the assessment of the credibility of witnesses as early as 1697, it was not until Bayes’ 
time (1763) that it was also applied to the acceptability of hypotheses. The resulting 
confusion, as is well known, has existed ever since and at the present time seems, if ary- 
thing, to be getting worse. If any justification for the study of the history of probability 
and statistics were required, it would be found simply and abundantly in this, that 
a knowledge of the development of the subject would have rendered superfluous much of 
what has been written about it in the last thirty years. 


34. Aquinas does not give a definition of probabilitas, but refers to Aristotle ; ‘ Probabilia 
sunt quae videntur omnibus, aut plerisque, aut sapientibus, et his vel omnibus vel 


* The same idea, of course, has come down to modern times in a line of direct descent but in a less 
deistic form; e.g. Spinoza, writing in 1677, says ‘for a thing cannot be called contingent unless with 
reference to a deficiency in our knowledge’; D’Alembert, in 1750, says: ‘il n’y a point de hasard 
& proprement parler mais il y a son équivalent: l’ignorance ot nous sommes des vraies causes des 
événements.’ More recently Paul Lévy in 1939: ‘Nous pensons, quoique depuis les travaux d’Heisen- 
berg d’éminents savants ne soient pas de cet avis, que la notion du hasard est une notion que le savant 
introduit parce qu’elle est commode et féconde, mais que la nature ignore.’ 
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plurimis maxime nobilibus et probatis.’ St Thomas himself regarded probabilitas as a 
quality which gave rise to an opinion. He says explicitly that it admits of degrees. Of 
chance (casus), he says: ‘Ea quae accidunt semper vel frequenter non sunt casualia neque 
fortuita, sed quae accidunt in paucioribus.’ And again ‘Sicut in rebus naturalibus in his 
quae ut in pluribus agunt, gradus quidam attenditur quia quanto virtus naturae est 
fortior, tanto rarius deficit a suo effectu, ita et in processu rationis qui non est cum 
omnimoda certitudine, gradus aliquis invenitur, secundum quod magis et minus ad 
perfectum certitudinem acceditur.’ As I understand the position, St Thomas recognized 
that probabilitas preceded certainty in the formation of knowledge; and that the frequency 
of events had something to do with the ‘fortuitous’ nature of the causality and the relative 
intensity of the underlying cause. But I cannot see in his writings an explicit statement 
that frequentia increased probabilitas or that the two were very closely related. The point 
really needs a deeper study than I felt worth while to give it, but it seems plain to me that 
the doctrine of chances was not present to his mind when probabilitas was under discussion. 


35. A good case can be made for the thesis that the religious attitude of the times 
discouraged by implication the development of a study of random behaviour. Even this 
does not entirely satisfy me as the complete explanation, but I think it very likely that 
before the Reformation the feeling that every event, however trivial, happened under 
Divine providence may have been a severe obstacle to the development of a calculus of 
chances. It seems to have taken humanity several hundred years to accustom itself to 
a world wherein some events were without cause; or, at least, wherein large fields of events 
were determined by a causality so remote that they could be accurately represented by 
a non-causal model.* And, indeed, humanity as a whole has not accustomed itself to the 
idea yet. Man in his childhood is still afraid of the dark, and few prospects are darker than 
the future of a universe subject only to mechanistic law and to blind chance. 

Whatever the reasons may be, it appears undeniable that the doctrine of chances took 
a remarkably long time to develop. Once launched, of course, it proceeded very rapidly; 
there is only a hundred years between Bernoulli’s Ars Coniectandi and Laplace’s T'raité. 
But the results of that century of discovery required several thousand years of germination. 
Until more intensive research may have been able to lay bare the early essays and modes 
of thought of scientists in the fifteenth, sixteenth and seventeenth centuries the birth 
process of the probability calculus must remain somewhat enigmatic. 


36. I have to express acknowledgements to several colleagues for help in tracing 
references; especially to Prof. Corrado Gini, to Father Dionisio Pacetti, O.F.M., whose 
authoritative knowledge of the works of St Thomas Aquinas was put freely at my disposal, 
to Prof. Sixtos Rios for some information about the game of primero, to Prof. W. Rose 
and to Mr G. Woledge for some references to early German and French gaming, and to 


* As is well known, the assurance of lives was forbidden in several countries under Roman Catholic 
influence in the sixteenth century; e.g. in the statutes of Genoa for 1588: ‘sine licentia Senatus non 
possunt fieri securitate...super vitam Pontificis neque super vitam...aliorum dominorum aut 
personarum ecclesiasticarum.’ It is rather remarkable, but I suppose accidental, that nearly all the 
chief early writers on the probability calculus were subject to Catholic persecution. Cardano and 
Galileo were both victims of the Inquisition. The Bernoulli family were exiled in Switzerland, having 
been driven from Antwerp by Spanish persecution in the Netherlands. De Moivre lived in England 
because of the revocation of the Edict of Nantes. Pascal was a member of Port-Royal, though he did 
not live to see Jansenism driven into exile. Fermat and Montmort escaped; but Fermat lived in the 
provinces and published nothing on probability, and Montmort also lived a retired life. 
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Dr F. N. David, with whom I have had many discussions on this fascinating subject and 
who read this article in manuscript. 
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APPENDIX 
Extract from ‘De Vetula’ 


Forte tamen dices, quosdam praestare quibusdam 
Ex numeris, quibus est lusoribus usus, eo quod 
Cum decius sit sex laterum, sex & numerorum 
Simplicium, tribus in deciis sunt octo decemque, 
Quorum non nisi tres possunt deciis superesse. 
Hi diversimode variantur, & inde bis octo 
Compositi numeri nascuntur, non tamen aequae 
Virtutis, quoniam majores atque minores 
Ipsorum raro veniunt, mediique frequenter, 

Et reliqui, quanto mediis quamvis propiores, 
Tanto praestantes, & saepius advenientes. 

His punctatura tantum venientibus una, 

Tilis sex, aliis mediocriter inter utrosque, 

Sicut sint duo majores, totidemque minores, 
Una quibus sit punctatura, duoque sequentes, 
Hic major, minor ille, quibus sit bina duobus. 
Rursum post istos sit terna, deinde quaterna, 
Quinaque, sicut eis succedunt appropiando 
Quattuor ad medios, quibus est punctatio sena, 
Quae reddet leviora tibi subjecta tabella. 


Hi sunt sex & quinquaginta modi veniendi, 

Nec numerus minor esse potest, vel major, eorum. 
Nam quando similes fuerint sibi tres numeri, qui 
Jactum componunt quia sex componibiles sunt, 
Et punctaturae sunt sex, pro quolibet una. 

Sed cum dissimilis aliis est unus eorum, 

Atque duo similes, triginta potest variari 
Punctatura modis, quia, si duplicaveris ex se 
Quemlibet, adjuncto reliquorum quolibet, inde 
Producens triginta, quasi sex quintuplicabis. 
Quod si dissimiles fuerint omnino sibi tres, 

Tune punctaturas viginti connumerabis. 

Hoc ideo, quia continui possunt numeri tres 
Quattuor esse modis; discontinui totidem: sed 
Si duo continui fuerint, discontinuusque 

Tertius invenies hinc tres bis, & inde duos ter: 
Quod tibi declarat oculis subjecta figura. 


Rursum sunt quaedam subtilius inspicienti 

De punctaturis, quibus una cadentia tantum est; 
Suntque; quibus sunt tres aut sex quia schema cadendi 
Tune differe nequit, quando similes fuerint tres 
Praedicti numeri. Si vero sit unus eorum 

Dissimilis, similisque duo, tria schemata surgunt, 
Dissimili cuicunque superposito deciorum. 

Sed si dissimiles sunt omnes, invenies sex 

Verti posse modis, quia, quemlibet ex tribus uni 
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Cum dederis, reliqui duo permutant loca; sicut 
Punctaturarum docet alternatio. Sicque 
Quinquaginta modis et sex diversificantur 

In punctaturis, punctaturaeque ducentis 

Atque bis octo cadendi schematibus, quibus inter 
Compositos numeros, quibus est lusoribus usus, 
Divisis, prout inter eos sunt distribuendi 

Plene cognosces, quantae virtutis eorum 

Quilibet esse potest, seu quantae debilitatis: 
Quod subscripta potest tibi declarare figura. 
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ON ESTIMATING THE LATENT AND INFECTIOUS 
PERIODS OF MEASLES 


IT. FAMILIES WITH TWO SUSCEPTIBLES ONLY 


By NORMAN T. J. BAILEY 
Design and Analysis of Scientific Experiment, 6 Keble Road, Oxford 


1. InrTRODUCTION 


The analysis of the household distribution of cases of measles by means of chain-binomial 
models has been comparatively successful (see Bailey (1955) for discussion and bibliography). 
At the same time considerable variation in the time interval between successive cases has 
usually been observed, and this sometimes leads to difficulties in identifying the links of the 
chain. An attempt was therefore made to produce a model which would take these variations 
into account. The simplest feasible arrangement is to assume that after the receipt of 
infection there follows a latent period which is approximately normally distributed. Then 
comes a period of infectiousness which is effectively terminated after a constant time by the 
appearance of symptoms and removal of the individual concerned from circulation. The 
latent and infectious periods taken together constitute what is usually called the incubation 
period. Arguments in favour of this model have been set out in detail elsewhere (see Bailey 
(1954, 1955), to which reference should be made for a fuller discussion). Simple estimates of 
the four parameters involved were given for families with two susceptibles only. However, 
at least two of the large sample variances were high, and there was considerable doubt as to 
the efficiency achieved. The purpose of the present paper is accordingly to develop a 
maximum-likelihood scoring procedure which will give efficient estimates and also make 
available a goodness-of-fit test. 


2. MATHEMATICAL MODEL AND SAMPLING DISTRIBUTIONS 


Let us suppose that the latent period, x, is normally distributed with mean m and variance 
o*, while the ensuing infectious period is of constant length a. Infection of the second 
susceptible during this time is taken to be a Poisson process such that the chance of con- 
tracting the disease in time dt is Adt. 

Suitable material for analysis (kindly made available to me by Dr R. E. Hope Simpson) 
is shown in Table 1. This is based on families with two susceptible children under 15 years 
of age and at least one case of measles, and was taken from the Cirencester area over the 
years 1946-52. The distribution appears to involve two distinct parts which overlap a little. 
Distribution A containing A families is considered to arise from both susceptibles having 
been simultaneously infected by an outside contact. The B families in the B-distribution 
are taken to be examples of cross-infection within the family. We shall assume for the time 
being that observations can be allotted to the correct distribution with complete accuracy. 
In practice an arbitrary decision (as in Table 1) may have to be made about borderline cases. 
A refinement of analysis is to introduce and estimate an additional parameter, the prior 
chance that a family belongs to, say, distribution A. The kind of procedure required is 
outlined below in §€. There are C families with only one case, and a total of VN = A+B+C. 
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Let w be the variable for distribution A. Since this is the absolute difference between two 
independent latent periods the frequency distribution of w is 


f(w) = 7a (0<w<o). (1) 


Next, consider the B+C families with either both a primary and a secondary case (B) or 
a single primary case only (C). The chance of the second susceptible escaping infection by the 
first case during the infectious period a is e~**. Hence the probability of the observed 
numbers, B and C, given B+C, is 


B+C —CaAa (| — p—Aa)\B ; 
sz Je (1—e-AayB, (2) 


Further, the distribution of the epoch 7, measuring from the beginning of the infectious 
period, at which infection of the second case actually takes place, is evidently 


f(r) = Ae" (1l-e4)"_ (0<7T<a). (3) 


Let distribution B arise from a variable z = x+7, where of course 2 has the frequency 

fonotion f(a) = (270?)-4 exp {— (~—m)?/(207)} (—co<x<oo). (4) 

The frequency distribution of z is obtained by using (3) and (4) to write down the joint 
frequency distribution for x and 7, replacing x by z—7, and integrating out 7. This gives 
AeAG—m-Hro) pu, 

ie ire 5 

f( 1—e—a wm) ? (5) 


where u=o-Y{z—(m+Ao*)} and w’ =u-ao. (6) 


We shall also need the sample mean, 2, and variance, v, of the B-distribution, together 
with the observed second moment about the origin, V = Xw?/A, for the A-distribution. 


3. MAXIMUM-LIKELIHOOD SCORING 
Using the three frequency functions given in (1), (2) and (5) above we can proceed in the 
usual way to derive maximum-likelihood scores and information functions for the para- 
meters A, a, m and ao. In order to do this as concisely as possible let us first write 


u l 
= _____. ea" 
Re)= | oe dt, (7) 
10 
and Ty = oi (0 = A,a,m,¢). (8) 


Amalgamating the three contributions to any score then gives quite simply 


8, =0L/0A = B(m—2+A++Ao*)-Ca+ dN, 
z 


S,=0L|0a = -CA+ ZT, 
—— (9) 
S,,=0L/am = BA+ET,, 





S,=0L/00 =Ao(4Vo-?—- 1) + BAo*® + ¥ T,, 
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where L is as usual the log likelihood. The quantities 7, are most easily calculated from the 


functions 1 x 
Q(z) = Jane and P(x)=}j Q(t)dt (x> 0)| 


(271) (10) 
=-P(\z|) (<0) 
using the tables published by the New York W.P.A. (1942). We then have 
R=}(P-P'), eR/oA = —o(Q—-Q’), 
OR/oa = 0 1Q’, OR/Om = —o(Q—-Q’), (a) 


OR/do = —o-*(uQ — w'Q') — 2A(Q—Q"'), 
where P=P(u), P’=P(w’), Q=Q(u), QW =Q(wv’). 
The 7, are then obtained from (8), (10) and (11). 

The derivation of information functions also goes in a straightforward way. A minor 
point worth mentioning is that if we differentiate 7, with respect to one of the parameters, 
say ¢, we obtain a2 

hee. 
Rogod 


The expectation of R~!02R/0¢00 is easily found, since when multiplying by the frequency 
function in (5) the factors R in numerator and denominator cancel, and the integration with 
respect to z gives no special difficulty. With 7; 7;;, on the other hand, the integrand involves 
a factor R-!, and it seems best to leave these terms as observed quantities. In any case the 
individual values of T; and 7, for each z have already been calculated in finding the scores. 
The information matrix, J, for the parameters A, a, m and a, in that order, then turns 
out to be 














S72 — Bot +02), EDT, + = = D7, T,,—B(1+A20*), UT,T,— Bars 7 
275+ om TT 4+ = =T,T,+ — 
ST?, — Br2, rT, T, — Bro 
{ DT? — Bd0? + 240-2, 
(12) 
where B = (B+C)(1—e-*). (13) 


Writing S for the vector of scores calculated at trial values given by the vector 8, we calculate 
approximate maximum_-likelihood values, 6,, given by 
6, = 6+T-'S, (14) 
as is well known. The procedure is then repeated using 0, for the trial values until sufficient 
accuracy is obtained. 
To obtain initia! trial values it is convenient to use the approximate estimates given in 
a previous paper (Bailey, 1955), namely, 
A = (1-T)U-, 
a = A“log (1+ Y), 
m=2Z—A1+aY-, 
o = (4V)h, 
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where T = Y-\(1+ Y-) flog (1+ Y)}’, 
U =v-}4V, (16) 
Y = BIC. 


If it should happen that in any set of data A were small or zero then no satisfactory initial 
estimate of o would be available from (15). However, it is still possible to obtain rough 
estimates by setting B equal to its expectation, leading to 

Aa = log(1+ Y) = F, (17) 
and putting the first three sample cumulants in the B-distribution equal to their expecta- 


tions, viz. 
Ky=m+A-1—a(e4—1)"} = 2, 


Kg=07 +A? —a®er4(er4— 1)? = 0, (18) 
Ky = 20-3 — a8 eA4(eA4 + 1) (e2*— 1)-3 = msg. 
Using (17) and (18), we can solve to the required estimates successively as follows: 
a? = m,[2F-— Y-3(1+ ¥)(2+ Y)>, 
o*? = v—a*[F-*-— Y-*(1+ Y)], 
m= Zz—-a(F-1— Y-), 
A= Fa. 


(19) 


4. ILLUSTRATIVE EXAMPLE 


Let us now apply the foregoing maximum-likelihood scoring procedure to the data exhibited 
in Table 1. Examination of the observed frequencies shows immediately that in attempting 
to fit any reasonably smooth curve to B we are likely to be troubled by an apparent excess 
of observations for days 7 and 14 (with corresponding deficits for adjacent days). This 
could be due to a small unconscious bias towards an integral number of weeks. A similar 
situation is not uncommon in other fields, e.g. interviewees may show a preference for ages 
ending in 0 or 5, and measurements of blood pressure sometimes show a marked excess of 
readings which are even multiples of 10. One way of offsetting such bias is to pool the 
frequencies for, say, days 6, 7 and 8 in one group, and for 13, 14 and 15 in another group, 
when carrying out a goodness-of-fit test. 

Preliminary estimates were obtained from (15). These have already been given in an 
earlier paper (Bailey, 1955) together with their large sample variances, and are 


A = 0-203 + 0-083, 
a = 813+3-28 al 
m = 7-94 + 0-87 days, 

o = 1:32+0-17 1 


(20) 


The estimates of both A and a appear to have rather low precision, and we should not be 
surprised if the maximum-likelihood values are appreciably different and much more 
accurate. After one cycle of successive approximation the second set of estimates did in fact 
show large changes. When the third set of values was obtained the information matrix was 
recalculated, and convergence was then rapid. Stability was practically achieved with the 
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fourth approximation—a final fifth stage gave only very small further corrections. The 
(16) maximum-likelihood estimates found in this way are 
X = 0-256+0-032, } 
= dG = 6-57 + 0-76 days, 
itial 4 . (21) 
ugh m = 8:58 + 0-32 days, 
@ = 1-77+0-13 days. } 
(17) There is thus a striking gain in efficiency compared with the preliminary estimates appearing 
in (20), and the additional labour in computation is evidently worth while. 
ota. When carrying out the usual goodness-of-fit test it is convenient to use the last approxi- 
mation but one, if sufficiently accurate, since we need not then recalculate the R(z) in 
computing the function given by (5). In practice, of course, the data will normally be 
18) grouped in units of 1 day, as in Table 1. The fitted values for distribution B were obtained 
Table 1. Observed and expected values for Hope Simpson's data on 
measles in families with two susceptibles 
Time interval Observed no. of families 
19) between ee 
2 cases oe SS xpected no. 
in days A B Total 
0 5 5 4-67 
1 13 13 8-58 H 
ed 2 5 5 6-73 
ng 3 4 . 4 4-53 
4 2 1 3)\ - 2-78 3 
288 5 9 of 5 9.95} 20! | 
nis 6 4 4 3-97 
lar 7 11 11 8-85 | 
“ 8 | 5 5 16-63 
, 9 | | 25 25 24-72 
of 10 | | 37 37 29-44 | 
he il | | 38 38 29-28 | 
p 12 26 26 (25-44 | 
, 13 12 12 19-99 | 
14 ; 15 15 14-28 
in 15 6 6 9-02 
16 | 3 3 4:82 
17 1 1 2-09 
18 3 3 0:71 
19 | | ; | | a i. amie 
) 20 | ; | , | 0-04 
) 21 | . 1 | 1 0:00 | 
Sub-totals | 29 190 | 219 219-00 
tlaceian NRE TE Ae Loam | 
ye One case only (C) 45 44-11 | 
e Primary and secondary (B) 190 ; 190-89 
| 
t he a ee ON Seat Ne ; ee | 
” Overall total (A +B+C) 264 | 264 | 
e 
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merely by calculating f(z) at the mid-point of each interval, as the small additional accuracy 
resulting from integration over the interval was thought not worth the extra labour of 
computation. On the other hand, integrated values for distribution A are immediately 
available from the tabulated values of P(x) in (10). It can be seen from Table 1 that the 
agreement between observed and expected values is on the whole quite good. For the usual 
goodness-of-fit test the classes bracketed together have been pooled so as to avoid small 
expectations. There are sixteen classes for the combined A- and B-distribution, giving 
15d.f.; and two classes, giving | d.f., for the numbers B and C. From the total of 16 d.f. we 
must remove 4 to allow for the parameters estimated. We find an overall x? of 20-3 on 
12d.f. As the 5% point is at 21-0, we can regard the fit as just adequate. Actually, we have 
already remarked the possibility of unconscious bias in the records producing local peaks 
at 7 and 14 days, and suggested that it could be minimized by amalgamating the frequencies 
for 6, 7 and 8 days in one group, and 13, 14 and 15 in another. When this is done we obtain 
a x? of 12-9 on 8d.f., which is entirely satisfactory since the 10 % point is at 13-4. 


Table 2. Efficiencies (in percentage) of estimates with parts of data absent 














Data available | A | a | m | o 
SS esaaeatiial : | | ne 
| 
All three sources present 100 | 100 100 100 
No double primaries (A = 0) 79 | 70 77 60 
No single cases (C = 0) 23 | 100 83 99 
B-distribution only (A = 0 = C) | 22 | 70 6 59 

















It is also of some interest to see what would happen if certain portions of the data were 
missing. There may, for example, be no families with a double primary, i.e. A = 0. All we 
have to do is to remove the term 2Ao~? from J,, in (12) before inverting the information 
matrix. Alternatively, there may be no record of C, the number of families with a single 
case only. No information about A and a is then available from the frequency distribution 
in (2). Accordingly, we must remove from J,,, J), and I,,, the contributions fa*, fad and 
fA2, respectively, where £ = (B+C) (e**—1)-!. Again, both these items may be missing 
and we may only have data on families with a primary and a secondary case. These results 
are summarized in Table 2, which shows the appropriate efficiencies derived by comparing 
the variances of the estimates in each case with the ‘best’ values given by the squares of the 
standard errors appearing in (21). It is worth noticing that reasonably efficient estimates 
of all four parameters can be obtained in the absence of double primary data, but that 
knowledge of the number of families with a single case is essential for an efficient deter- 
mination of A. Without the B-distribution, of course, little information of value would be 
forthcoming. 


5. EFFECT OF VARIATIONS IN A 


[t has been shown elsewhere (Bailey, 1953) that, so far as measles data for Providence, 
Rhode Island, were concerned, chain binomials gave a satisfactory goodness-of-fit when 
analysed stage by stage only on the assumption that the chance of cross-infection, p, varied 
between families. In the notation of this paper we have 


p _ 1—e-4, (22) 
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It follows that we ought in the present context to consider the possibility of variations in A. 
We could hardly expect to make any precise estimate of parameters in the distribution 
of A from families with only two susceptibles, but it is worth making a rough estimate of the 
likely consequences. We can do this by calculating the expectation, with respect to appro- 
priate variations in A, of the expected frequencies in the model. Now the distribution 
chosen for p was 1 we Pee ey oh 
"ay? (l—p)"*dp (0<p<\1), (23) 
and the pooled estimates of x and y, based on households of three and four, were 1-18 and 
0-28 respectively. The mean and variance of p were thus 
p = a|(e+y) = 0-81, 
v, = ry/{(xt+y)?(e«t+y+1)} = per 

Using the estimates of A and a in (21) we see that the value of p given by (22) is about 0-81, 
the same as the estimate in (24) taken from the Providence data. If we now suppose p to 
vary about this mean value the expected frequencies for the observations B and C will 
remain unchanged. The A-distribution will also be the same since it involves only oc. 
However, the B-distribution given in (5) will be modified, and may be replaced, approxi- 
mately, by v.32 vy, e24/ Of 92 ie 

fie) + BEE = fia)+ BE (a +o), (25) 
where we have simply expanded in powers of ép = p— and have taken expectations, 
neglecting terms of third and higher order in dp. It may be noted that the additional 
quantities on the right-hand side of (25) are all easily calculated from those already available. 
It turns out that the net result is to flatten and displace slightly towards the origin the peak 
of the fitted curve. The x? values are a little higher. The data grouped so as to minimize the 
peaks at 7 and 14 days give a x? of 14-3 on 84.f., which is still satisfactory. Without this 
adjustment we obtain 21-5 on 12d.f., which is just significant at the 5% level. However, 
the variations in p (or A) are unlikely to be as large as this in Hope Simpson’s data, which we 
should expect to be more homogeneous than the Providence material. We are entitled to 
conclude that our results would probably not be appreciably influenced by only moderate 
variations in the chance of infection. 

A somewhat similar analysis can be undertaken in respect of variations in the length of 
the infectious period, a, but this is more complicated in that both the A- and B-distributions 
are affected, and each involves considering two independently variable infectious periods. 
Moreover, with measles at any rate, there is little scope for introducing any substantial 
variations of a into the present model. This can be seen from the fact that if the variance of 
a were v, the expected second moment about the origin of the A-distribution would be 
207+ 2v,, and the observed value only 3-48. However, it can be shown that even if v, 
were, say, approximately unity, although the effect on the goodness-of-fit would be very 
small so far as the A-distribution is concerned, there would be an appreciable flattening of 
the B-curve. This is clearly a matter that requires further investigation. 


(24) 


6. ALLOWANCE FOR MISCLASSIFICATION OF CHAINS 
So far we have assumed that the basic chains of the mathematical model (almost trivial for 
families of two) can be correctly identified. For the class with only one case there is no 
possibility of error. When there are two cases, however, and the A- and B-distributions 
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overlap, then there is a definite chance of misclassification. In the example discussed in §5, 
where the two distributions overlap only to a very small extent, the effect is likely to be only 
a minor one confined to a few borderline observations. With more substantial overlapping 
it is desirable to make proper allowance for it in the analysis. This can be done by introducing 
a new parameter, € = 1—y, which is the prior probability that a family is of type A. 
Maximum-likelihood scoring for the five unknown parameters, A, m, a, o and &, can then 
be carried through as before. It is, however, considerably more complicated than the case 
already discussed. For this reason an indication only of the modified procedure will be given. 
A full treatment in extenso may be worth while elsewhere if data requiring it appear. 

First consider the information available from the number of families, C, with one case 
only. The expected proportion is clearly 7e—**, and this must now be referred to the whole 
sample, V, and not merely to B+C as before. The analogue of (2) is thus 


(¢) ne e—Cra (1 — e“rayN-C, (26) 


Next consider the N —C families which are liable to misclassification with regard to the 
A- and B-distributions. If we write the frequency functions in (1) and (5) as f, and f,, the 
contribution to the likelihood from any family with two cases separated by an interval of 


y days is Efi(y) +90 =e") fal) (27) 


“l-ne 
Combining the contributions trom (26) and (27) gives for the log likelihood 


L = Clogy—Cda + > log t&fi(y) + (1 —e™*) fay}, (28) 
y 


where the summation is taken over the N —C values of y. 

Now suppose that there are A and B families that are almost certainly of types A and B, 
respectively, and D families of uncertain classification. Almost certain classification means 
that the value of y is such that one of the two frequencies f,(y) and f,(y) is negligibly small. 
In doubtful cases it may be safer to take A = B = 0. For the A and Balmost certain families 
we therefore put f, = 0 and f, = 0, respectively. This leads to the modified expression 


L = Alog£+(B+C)log y—CdAa + Blog (1 —e~*) 
+ Blog fit log fet Dlog teh +e) fa}. (29) 


The expressions (28) or (29) can be used as a basis for the standard technique. Although 
the scores and information functions are now much more complicated, they contain certain 
components that are the same as in the simpler situation. 


I am particularly indebted to Dr R. E. Hope Simpson of the Cirencester Public Health 
Laboratory Service for allowing me to make use of his excellent epidemiological records, 
and also for many stimulating discussions. I should also like to thank Mrs Tamara Hazlewood 
for carrying out the computations on which the numerical results of this paper are based. 
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THE BEHAVIOUR OF AN ESTIMATOR FOR A SIMPLE BIRTH 
AND DEATH PROCESS 


By J. H. DARWIN 
Applied Mathematics Laboratory, D.S.I.R., New Zealand 


1. INTRODUCTION 


The simplest birth and death process is one in which there are constant probabilities Adt 
and wdt of an individual of a population respectively giving birth to a new individual in 
(t,¢+dt) or dying in (t,t+dt). If there are N, individuals of the population alive at time 0, 
the probability generating function (p.g.f.) of the number alive at time 7 is 


ee <8) oe ee =|" 








fe) No 
nr > 
(Ep p Aap (Aa — wy" 


ents ale 
| Aa—p—A(a—1)z 


is ote (1) 











say, where « is written for exp ((A—y)7). Thus the probability that the population size is 


n at time 7 is @\Ne min(n, Ne) (N\ (No-+n—r—1) (b\" (d\"-* 
a on a hee 

The question of estimating the constants A and has been considered by Anscombe (1953), 
Kendall (1949, 1952) and Moran (1951, 1953). In his 1949 paper Kendall discusses the case 
in which it is known that ~ = 0, and observations are taken at times 1, 27, ...,47. When 
js = 0 it is possible to express the probability of a population size n in a much simpler form 
and find the maximum-likelihood (m.].) estimate of A. In the other four papers it is mainly 
supposed that observation is continued until a certain number of events (births or deaths) 
has occurred. Such observation may not always be practicable, but observation at regular 
intervals may be feasible. The complex form of the above probability then almost prohibits 
the use of m.1. estimation. The estimates of the constants then used must preferably have 
only a small bias and must be reasonably accurate compared with the m.1. estimates for 
continuous observation. We discuss the bias and variance of an estimate of « when k 
evenly spaced population counts are made. If it is also known how many events have 
occurred by the end of these k counts, the m.1. estimate of /A for continuous observation 


is available. 


2. THE ESTIMATE OF THE RATE OF INCREASE, exp ((A—/4)T) 


Suppose the population, known to be N, at time 0, is counted at times 7, 27, ..., kr and that 
its size at time i7 is N,. Then the m.1. estimate of « = exp ((A—/) 7), (a) when it is known that 
ye = 0 (Kendall, 1949) and (b) when it is known that A = 0 is 

N,+N,+...+™, (2) 
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It is suggested that X, be used to estimate « when neither A nor 1 is known to be 0. The 
estimate is intuitively a reasonable one, since the average value of the ratio of corresponding 
terms in the numerator and denominator is «. X;, is in general biased (see § 2-1) and demon- 
strably inconsistent (see §2-4) as & tends to infinity for a range of values of 7 for given 


k 
A, # and N,. However no simple unbiased estimate of « suggests itself (e.g. (1/k) 5N;/N;_1 
1 


requiresa stopping rule if, for some j, N;,.N;,,, ..-,.N;,, = 0and itsaverage value, relative to this 
rule, will in general be biased). Also X,, has the virtues that its bias (see § 2-2) and variance 
(see §3) tend to zero as N, tends to infinity, and that its asymptotic variance then can be 
made of comparable size to that of the m.1. estimate of « for continuous observation by 
making observations over a greater period (see §3-1). 


2-1 Bias of X;, 


In finding Z(X;,) we may average for X;,, first with respect to N, for a fixed M,, ...,N,_1, 
then average the result of this over N,_, for fixed N,, ...,N,_2, and so on. 


zi 
ee N, +--+ +Ny_2 +Mp_a(1 + 2) 


E(X;, | Nj, ..-»Ny_1) =| — | res a (3) 


since E(N,, | Nj, ..->Np_1) = &Ny_1- 
When averaging over N,,_, we observe that the right-hand side is convex from above in 


N,-1- Then Jensen’s convex function inequality for N,_, shows that 


oa M, Hee +N,_3 +N,_2(1 +a+ sal 
B(X |My Me-a) <B| Not. +Nyug+Ny_a(l +2) 


The same argument repeated till Ny is reached shows that 
E(X,) <«=exp((A—y)7). (4) 
Since log z is also a function of x convex from above 
E(logx)<log H(z) and E(log X;,)<log E(X;) <loga = (A—y)rT. 


That is, X;, is a negatively biased estimate of «=exp((A—y)7), and log X,, is a negatively 
biased estimate of (A—j)7. A closer consideration shows that equality in (4) is never 
attained for finite N, and k and positive 7. 


2-2. N, large 


This bias becomes small when N, is large. For, suppose 2 is a positive random variable 
and A, B, C and D are positive constants. Then 
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whence 
N+... +Ny_g +M,_1(1 +a) | 
z| N+...+y. | Myo My-a| 
Mit + + Meat Meal -a+07) _ (ay +a) YNp eg ) (6) 
“Nyt... +Ny_g +N_-o(1 +) [No +... +-Nj,_3 +N,_o( 1 +)]?)’ 


where yN;,_» is var (N,_, | Nj.) and in fact y = [(A+)/(A—)]a(a—1). If we continue this 
process on the first term of the right-hand side downward through N,,_», ....N, we achieve 
a series of bias terms of the same sign. The above one is numerically less than or equal to 
(1+«a) ya*-?/N,, and they are all of this order in Nj. In fact the bias is 


< (y/No) [11 +) oF? + + (Ltat... tak) (L+at...+a%-) 1], (7) 


and this tends to zero as N, tends to infinity, for fixed « and k. 


2-3. Leading term for large N, 


We use the usual asymptotic value for H(X,,) formed by the average value of the quadratic 
term in the Taylor expansion of X,, in powers of (N, —N,), ...,(Nj,—,). Suppose Xj, is the 
value of X, when each N, is set equal to N;. Then the bias 


k ae 
=D MOX{/eN,aN) cov (N,N). 


The cubic term is of order 1/N@, and 
cov (N;,.N;) = Noa(a'—1)(A+p)/(A—p) for jet. 
The bias is, to first order in 1/N, 


(A+#) (a —1) [hak —(k—1) ak — a] 
(A+) (#—1)8 


=— Tp) GF Ip art +(b—2)at A+... +a], (8) 
0 


This term disappears when k becomes large for « greater than 1 but remains finite for a less 
than or equal to 1. This suggests that the behaviour of lim H(X;,.) when.) is not necessarily 


k->o 


large may vary with the value of «. 


2-4. H(X,,) for large k 


We have been unable to find results for large & holding for the whole range of ~/A, but 
those given below indicate that the bias is most serious when A is less than or equal to yu. 

This is to be expected as the probability that N, is 0 then tends to 1 as k& tends to infinity, 
when 7 is fixed. There is thus a high probability for large k that the last ratio of corresponding 
terms in the numerator and denominator in X,, is 0 which is less than «. 

We may divide the process of averaging X,, over the values of N,, ...,Nj, into 

(a) summation over N,, for any fixed MN, ...,N,_,. Then 

(b) summation over all the sets (Nj, ...,N,_,) of which no member is zero and 

(c) summation over the (k—1) groups in which the first member to be zero is in turn 


N;, -* -. 
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Then £(X,,) has contributions from (b) and (c) after (a) has been done. 
(a) First for any fixed N,, Ng, ...,N,_1 
N, +... +Np_-2 +M1(1 + &) 


, Xz Prob(N,, ...,N,) = ee Basta Prob (N,, ...,.N,_;). 9 
rx, rob (N,, k) | wae ap rob (MN, k—1) (9) 


(6) The ratio in (9), for any set of positive values of M,, ...,N;,_, is less than or equal to 
k-1 

1+a. The contribution to H(X,) from (6) is then less. 1 or equal to (1+) (1 -> PB) 
r=1 


where P, is the probability that N, is the first zero N;,i = 1,2,.... P. is the total probability 
that N, is zero, minus the total probability that N,_, is zero, 


¥ 7. * (K—)". (10) 


k-1 ‘p(ak-1 — 1)\Na aaa ; 
Then > B= ( - ) and the contribution from (6) is less than or equal to 
1 


aa —p k-1 N 
von[t- (SE) 


(c) IfN,, when r is one of 1, ...,k—1, is the first N; to be zero 


X;, = 1-N/(No+... +Np-1) < 1 (11) 
k-1 
and the contribution to H(X,,) from (c) is less than or equal to ¥ P.. 
1 
Hence from (a), (b) and (c), 


p(ak-1 =. 1} No 
H(X,)<1+a—a(45— 7) ; (12) 
Now if A is greater than yw this upper bound is approximately 1 +a—a(m/A)% for large k. 
Che inequality a> (Aj) Ne (13) 
determines, for given A, « and Nj, a value 7, of 7 such that X,, is inconsistent with respect 
to k for T> Tp. 

If the term N/(N) + ...+N,_,) of (11) which has so far been omitted is taken into account, 
the upper bound in (12) is reduced by 


= EY [Nol (No + --- +N,-1)] Prob (N,, ...,N,_1, 9), (14) 
r=1 N; 
where the inner summation is taken over the non-zero values of N,, ...,N,_,, when N, is zero. 
Consideration of (14) might for some values of A and yw considerably lower the upper 
bound of #(X;,) and so decrease the interval 7). No general necessary and sufficient condition 
for the consistency of the estimate X,, as k becomes large has emerged from such a con- 
sideration. However, some interesting particular results can be obtained. 
If A = pw that part of (14) which comes from JN, being zero is [A7/(1+Ar7)]}%o and 


E(X,,) < 2—[At/(1 + Ar) }%o— [(k— 1) Ar/(1 + (k—1) Ar)]}% 
= 1—(At/(1+Ar))*o for large k. (15) 


Hence there is always inconsistency no matter what finite value N, has. 
If A is less than y a similar use of the first term of (14) shows that there is always incon- 
sistency if N, = 1. But for N, greater than 1 such use involves conditions on A and y for 
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inconsistency to be demonstrable. If, however, only the second term of (14) is included we 
may prove a stronger result. This second term is 





(1 —a)\™ 
SLM IO +¥] (52) Prova) 


where the summation is over non-zero values of N, and Prob(N,) is that given by the 
coefficient of z™: in the p.g.f. (1). Then by Schwarz’s inequality the term has a value 


> NyPHLE (Ny +N) [a(t —2)/ (qe — Aa} Prob (Ny) 


> Ps(F, ak (16) 

k-1)\ No 

Hence E(X,)<1—P,— P3/(R,+ )+al1— (4 jar) | 
=1—P,— P3/(P, +a) (17) 


for large k, and any « less than 1. 
Suppose a is small. Then this lower bound 


2 aNy(u—A)|(Ngle—A) +1). 
It is always possible therefore to find a 7, say 7,, making « small enough for this lower bound 
to be less than x. Hence X,, is inconsistent with respect to k for 7 greater than 7,. 


2-41. The case uw = 0 


That consistency with respect to k is possible is shown by the limiting behaviour of E(X;,) 
when it is known that ~ = 0 and X,, is used to estimate exp (ArT). Then the joint p.g.f. of 
N,. Nj, -.-,N;, can be written down, and hence that of N;,—Nj, and Ny+...+N,_,. Suppose 
this is (u,v), where wu and v are carriers for N,,—N, and N,+...+.N,_,, respectively. Then 


B(X,) = 14 {5 [bl Maa dole (18) 
It follows from (1) with = 0 that 


vk ™ 9 
$(u, v) = be 1) u(ak-1 + ogk—2y + ea ~ 
Then 
Ts an ee \ PL NovNok tack — vk) (x — v)No 
MX) = 140-1) | arena 
( 


1 pNok- —1(gk — 1) (a a—1)No 
>1 =i d 
sa he (ak(1 —v) +a —1)Nott . 
1 
= 1+(a—1)Notl( vNok—-l d(ak(1—v)+a—1)-No 
J0 


= 1+(a@—1)(l—a-*) — (a —1)No+t (1 —a-*) (Nk —- vf yNok—2(ak(1 —v) +a—1)odv. 
J 0 
(20) 


e 


The integral here is positive and less than | (ak(1 —v) +a —1)-Nodv which can be evaluated. 
0 


For N greater than 1 it isO(«-*) and for Ny = 1 it is O(ka~*). Hence the right-hand side has 
« for its limit as k tends to infinity. But Z(X,,) is less than or equal to «. Hence lim £(X,,) 


k->o 


exists and is a. 
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2-42. The caseXA = 0 


The actual limiting bias can be found when it is known that A = 0, so that X;,, is an estimate 
of exp(—yr). By the same method as was used in §2-41 we find that 


1 yNo-1 


E(X,)>1-N,(1 —a)%| (21) 


o (1—anyNo” 
For example for N, = 1, when the bias is biggest, 

B(X,)>1-((1—a)/a) log [1/(1—a)]. (22) 
This takes values 0, 0-107, 0-234, 0-389, 0-598, 0-744, for a = 0, 0-2, 0-4, 0-6, 0-8, 0-9, 
respectively. 

3. THE VARIANCE OF X; FOR LARGE Ny 

§ 2-4 shows that there is not in general an improvement in the average value of X, as k 
becomes large, but § 2-2 shows that a large Ny means that the bias is small. It is thus natural 
to consider the accuracy of X,, rather for large N, than for large k. We shall use the variance 
of X,, for large N, as a yardstick for measuring the efficiency of this type of regular observa- 
tion compared with continuous observation. 

The limiting form of the variance of X,, to order 1/N, is 

0 (A+) a(a—1)? 
—= — cov (N,,N;) = =". ‘ (23) 
woes ON, an, OY OO") = NA—) (@F—1) 
The coefficient of 1/N, is small for large k when « is greater than or equal to 1, but for « less 
than 1, increasing k gives little improvement. 

We wish to compare this variance with that of the m.1. estimate of « when observation is 
continuous. The justification for the use of the m.1. estimate is that if MN, is large there is 
effectively a large number of independent replicates. Suppose such observation is made 
over a period ¢ and that r events have happened at intervals 7,, 7», ...,7, so that 


Ty +Tet .-- +7, <b. 


Then the likelihood of this happening when there were N, members at the beginning of 
observation is 


exp[—Ng(A + 4)7] Noro exp [— (A+) 72] 21%; ...eXp[— (A+) T,] Ma Yp1 
xexp[—n,(A+ yp) (t—(7,+...+7,))]. (24) 
In this v; is either A or ~ depending on whether the (i+ 1)th event that has happened is 


a birth or a death; n,,, isn;+1if v;is A and is n;—1 if v; is ~. Here ny=Ny. Then the log 
likelihood is 


—1 
L=-(At+p)X+¥ log n;+ BlogA+ Dlog p, (25) 
0 
where X = MyM +NyTot...+Ny_1T,+N,(t— (7+... +7,)); 


B is the number of births, and D the number of deaths in time t. The m.1. estimate of A— yu 
is (B—D)/X. The variance of the m.1. estimate, exp (t(B—D)/X) of «a,=exp[(A— ) 4], is, 
to the first order in 1/N,, 
B-Dy? B-D 1 
ea x var X — 2- com cov (X, B—D)+ x var (B— D)| : (26) 


where B—D and X are mean values. 
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Now B-D=n,-™, 
E(B—D) = N(%-—1), 





and var (B—D) = N,(A+p)a(a,—1)/(A—p). (27) 
t \ 
Again, since i= | n,du, 
0 
E(X) = N(q— V/(A-»), 
t rt 
E(X?) = [ [ Bem) ») dudv =2{ du|"E (n?) AYU») dy, ete., - 
JoJo (28) 
t 
and E(X(B—D)) = | E(n,(m—Np)) du 
0 
t = 
= [ Bini) 2PM du— XN, ote. 
0 } 
fT ((B- Psy (A+) 08 (log on)? ' 
Then for large N, var Jexp( T N <0-wa- (29) 


We may first compare (29) with (23) when / is 1 and tis taken as 7. Since this is a comparison 
of the variance of the m.1. estimate of « for continuous observation over 7 with the variance 
of X,, for one observation at 7 we may define the ratio of these variances as the efficiency 


of X,. This ratio is 
sinh (4(A—y)7)} © 


This is the same formula for A— as was Kendall’s formula for A (Kendall, 1949). 


3-1. Further comparison of efficiencies 
Suppose a population is known to have a large size Ny at time 0. Then we may compare 
the accuracy of the following estimates of « =exp ((A— )7). 
(a) X, when the population is counted only at 7, 27,...,k7, and 
(b) exp ((B— D)7/X) when the population is continuously observed over the interval (0, ¢). 
The limiting variance of X,, for large N, is (23). That of exp ((B—D)7/X) is (29) multiplied 
by (1/t)? a2-*/", Suppose t/7 = w. This latter variance is then 


(A + 1) @(log a)? 
N,(A—p) (a — 1)" 


Hence the accuracy of the estimates (a) and (b) for large N, is the same if 
—1 = (a*—1) a(log a)*/(a—1)?. (31) 


For a given « and k a unique solution of this for uw always exists. For, the left-hand side is 
always monotonic in u for a given «. 

If a is greater than 1 the left-hand side goes from zero to infinity as u goes from zero to 
infinity, and this range includes the value of the right-hand side. 

If « tends to 1, w tends to k. 
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If a is less than 1 the left-hand side goes from zero to — 1 as u goes from zero to infinity. 
By comparing the derivatives of log a and (a—1)/,/« we can show that a(log «)?/(a— 1)? is 
always less than 1 for all positive « other than 1. Hence the range zero to — 1 includes the 
value of the right-hand side. 

It also follows that, except when a = 1, k is always greater than uw. That is, k7 is greater 
than t, as indeed is intuitively obvious as (a) can in general only be as efficient as (b) if X, 
embodies some information not available to (b), i.e., if at least the last of the k observations 
is made after t. 

When « approaches | the rate of change in population size becomes small and continuous 
information becomes of little extra assistance in its estimation. 

Table 1 gives the values of u for k = | (when (23) is exact) and a range of values of «. Then 


y = SLL + a(log a)?/(e—1)] (32) 
log 
We thus have the following result. 

We know that a population has a large size N, at time zero and wish to make an estimate 
of its rate of increase when it develops according to a simple birth and death process. The 
most natural estimate to take is the m.]. estimate formed, as in §3-0, from the record of 


continuous observations of the population over a period t. 








Table 1 
. 1 
| a u a u | 
| 
| 0-1 0-386 2-0 0-971 | 
0-2 0-649 3-0 0-941 
0-4 0-890 4-0 0-916 | 
| 0-6 0-972 5-0 0-898 | 
| 0-8 0-955 10-0 | 0-838 
| 1-0 1-000 | 





This section and § 2-2 show that the estimate X, formed from a single population count 
at time 7 is approximately unbiased, and of as low a variance as the m.]l. estimate 
derived from continuous observation over (0, ¢), if 7 is greater than a particular value, t/w, 
greater than ¢. For a wide range of « Table 1 shows that this value is not very much 
greater than ¢t. For example, for « in (0-294, 16-1) it is less than 1-25¢. 


4, ESTIMATION OF 1/A 


If N,,...,N, are the only observed quantities, estimation of /A is likely to be very 
inaccurate since the range of values of /A giving the same set M, ...,.N;, with a reasonable 
probability is very large. The extra information of the actual number of births and deaths 
that have occurred may be available. Then, since the m.1. estimate of y/A is, from (25), D/B, 
this estimate can be used, and it will be fully efficient for large N,. The theoretical difficulty 
is that a rule would have to be devised to cope with the situations B = 0, or D and B = 0. 
In the first case A = 0 is a suitable inference. The second case should of course be avoided 
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by N, being made large enough. The probability of there being no events at all in time 7 is 
exp[—N,(A+ )7] and of there being no births [y/(A+)+(A/(A+ /)) exp (— (A+ )7)}*», 
both of which can be made small for non-zero A and y by an increase in Np. 


I am greatly indebted to the referee for his suggestions for the improvement of this paper. 
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EXAMINATION OF A QUANTUM HYPOTHESIS BASED 
ON A SINGLE SET OF DATA 


By 8. R. BROADBENT 


The British Coal Utilization Research Association 


1. INTRODUCTION 


In an earlier paper (Broadbent, 1955) the following situation was discussed: an experimenter 
makes the observations y,, Ys, ...,y, and wishes to compare his data with the quantum 
hypothesis y, = B+2r,0+e, (i=1,...,n). (1) 
Here f and 26 are constants (20 is the quantum), r; is zero or an integer and €¢; is the error of 
observation. On this hypothesis the data will be grouped about regularly spaced means and 
the grouping will be apparent if the ¢; are small in comparison with 6. The alternative 
hypothesis, that the y; are not so grouped, but are distributed unimodally or rectangularly, 
is called the rectangular hypothesis. 

Solutions were suggested in the previous paper to the following problems: 

(i) estimation of / and of 26, 

(ii) estimation of the variance of ¢;, 

(iii) testing whether to accept a quantum hypothesis which is independent of the data 
used in the test. 

Sometimes the quantum hypothesis to be tested is not independent of the data alleged 
to support it, but the data have actually suggested the hypothesis. In effect the experimenter 
says to the statistician: ‘My data appears to be grouped about regularly spaced means. 
I have no independent evidence of the positions of these groups, and the body of scientific 
knowledge (and my prior belief) is neither strongly for nor strongly against such a quantum 
hypothesis. Is the apparent grouping a coincidence, or does it indicate some physical 
reality?’ This question was specifically excluded in the previous paper, and is the subject 
of the present investigation. 

The situation may be clarified by two analogies which show that this type of problem is 
common in statistics. The first is with the y? test of goodness of fit. Here the agreement 
between data and hypothesis is measured by y?; when the hypothesis is independent of the 
data the number of degrees of freedom used in the test equals the number of classes used 
minus one. But if the data are used to estimate parameters which the hypothesis does not 
specify, the y? test must be modified. The modification given by Fisher (1924) consists 
simply in reducing the number of degrees of freedom used in the test by the number of 
parameters estimated. If the data were used for other purposes, for example, to suggest 
whether one or another type of distribution be fitted to the data or what the class limits 
should be, the modification would no longer be simple. The second analogy is with the 
analysis of time series, which is closely related to this problem. In periodogram analysis 
agreement with the trial periods may be measured by the intensity S?. If the hypothesis 
(i.e. the trial period) is independent of the data, the significance of S? may be tested. But 
this independence is often not the case; for example, Kendall (1946) showed that of 
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Beveridge’s 18 or 19 ‘real’ periods in the analysis of the wheat-price index, at least three- 
quarters were spurious. Kendall concludes that tests of significance in the periodogram 
remain undiscovered; Irwin (1956) and Rudra (1955) have recently discussed such problems. 

We shall for simplicity suppose # = 0 in the hypothesis (1), and we write 2d for a quantum 
considered after examination of the data whereas 26 is the quantum in a hypothesis inde- 
pendent of the data. The statistic s?/d? (see the previous paper and (2) below) measures the 
agreement between the data and a proposed quantum 2d. 

On the rectangular hypothesis, s?/d? has mean }, variance ;4n and is approximately 
normally distributed. A significantly small value of s?/6? indicates the validity of the quantum 
hypothesis. The value of s?/d? may have to be considerably smaller than a conventional 
significance point of s?/d* to validate the quantum hypothesis, since use of the data to suggest 
a quantum implies a low value of s?/d?, on either hypothesis. How small s?/d? may have to 
be is indicated by the experiment described below. 
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s?/d? as a function of 1/d for a single set of 20 observations uniformly distributed 


Fig. 1 


01 





Consider the variation in s?/d? for a fixed set of observations as d changes. We restrict 
attention to values of d smaller than the largest observation, since the data cannot give 
information about larger quanta. It is convenient to consider 1/d increasing in value from 
the reciprocal of the largest observation. Then s*/d? appears as a violently oscillating 
continuous function. An example is given in Fig. 1, which shows the stationary values of 
s*/d? (joined by straight lines) for a set of twenty observations, nineteen of which were 
randomly drawn from a uniform distribution between 0 and 1, 1 being taken as the twentieth 
observation. This sampling procedure is discussed in §3. On this figure are shown also the 
mean and the lower 5 and 1 % significance points of s?/d? for twenty observations on the 
rectangular hypothesis (taken from Table 3 of Broadbent, 1955). Although the observations 
actually obeyed the rectangular hypothesis, s?/d? fell below the 5% point in no less than 
thirteen intervals for 1/d between 1 and 100, and below the 1 % point in three intervals. 
At 1/d = 80-1, s*/d? nearly reached the 0-1 % point which is 0-1273. At all these values of 
1/d, the data would have been conventionally judged to be significantly grouped. 
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The experimenter should realize that s?/d? may be made as small as he pleases by suitable 
choice of d; indeed, if the data are rational, s?/d? vanishes for infinitely many values of J. 
When he considers possible quanta, he is in effect looking for the minima of s*/d* as a function 
of 1/d. Small values of s?/d? at large values of 1/d are expected even on the rectangular 
hypothesis. We proceed on the assumption that the experimenter is looking for minima of 
s*/d? at small values of 1/d (i.e. large values of the quantum 2d). 

To find the distribution of such minima on the rectangular hypothesis seems to be a very 
difficult problem in analysis. Conventional analysis of extreme-value problems does not 
appear to apply to the minima of such functions as s?/d?, in which the random element (the 
observations) enters once and for all, and is thereafter treated as fixed. The distribution is 
accordingly investigated by sampling methods in the third section, the calculations required 
being developed in the second section. The conclusions may be of assistance in testing 
proposed quanta for significance. 

Two bases for this test should be emphasized. It is assumed that the experimenter is 
searching for possible quanta, and will propose the quantum which seems to him most 
unlikely on the rectangular hypothesis. This search for a quantum is analogous to the 
estimation of a parameter in the x? test of goodness of fit, and the test proposed is in effect 
analogous to the well-known modification to this y? test. We do not take into consideration 
the likely but imponderable event, that the experimenter originally decided to search for 
a quantum only because his particular data suggested it (just as the usual modification to 
the x? test does not take into account a possible choice of hypothesis tested). Secondly, the 
experimenter’s method of choosing a quantum will probably not agree exactly with the 
method described here. The first consideration affects the test in the following way: if our 
test shows that the experimenter’s data are not inconsistent with the rectangular hypothesis, 
we may confidently assert that the data does not support his quantum hypothesis. Similarly, 
if the experimenter does not assume / = 0 but estimates a value for /, and his s*/d* is not 
inconsistent with the rectangular hypothesis, we may use the test described below to reject 
his quantum hypothesis. The second consideration operates in the opposite way, that is, 
even if our test would reject his proposed quantum, it remains possible that he could with 
improved methods propose a quantum which fits his data more closely. It is, however, 
likely that intuitive estimators of quanta do not give estimates very different from optimum 
estimators; see, for example, the discussion on Prof. Thom’s data (1955). In this case the 
onus is still on the experimenter to propose the quantum which fits the data better. 


2. ANALYSIS OF OBSERVATIONS 


Suppose 7¥/,, Yo, ---, ¥, are the (positive) observations made by the experimenter and arranged 
in increasing orc.er of magnitude; the adjustments necessary if two or more of the y; are equal 
are trivial and not detailed here. The agreement of these observations with any proposed 
quantum 2d is clearly measured by 


n 
s*/d? = & (Yi— 2r¢d)*|(nd"). (2) 
{= 
Here r; is zero or that integer which minimizes | y,;—2r,d |; this does not define 7; uniquely 


when y; = (2m+1)d for m zero or an integer, and in this case we take r; = m+ 1. The set 
{r1,%9) ---s Tn} we write r(d); the vector is a function of the ebservations and of d. 
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We noticed in the first section that the largest value of d we need consider is d = y,,. 
Suppose that values smaller than d* need not be considered; for example, d* might be the 
order of precision with which the observations were made. We consider the behaviour of 
r(d) and s?/d? for a fixed set of observations as d varies from y,, to d*. 

The ith element of r(d) increases from m to (m + 1)as d attains the value y;/(2m +1) = d;,,. 
We arrange the values of d,,, (i = 1,2, ...,; m = 0,1, ...) in decreasing order of magnitude 
from d,9 = y,- The interval between d;,, and next smallest element of the ordered set we 
call the interval [d;,,]; the interval is closed at d;,, and open at the other end. We notice that 
at d = y, the vector r(d) is {0, 0, ..., 1}; the vector remains constant for all values of d within 
any interval [d,,,], and changes in the way just described, only at the values d,,,. 

It is obvious that s*/d? is a differentiable function of d within any interval [d,,,]. At d;,,, it 
may be shown that s?/d? is continuous but not differentiable; the term (y;—2r,d)/nd* has 
slopes + 2/(ny;) at d;,, = € as e—> 0 (independent of m), and so has an upward-pointing cusp 
there. It follows that s?/d? cannot have a stationary value or minimum at any d,,,, but may 
have a maximum there. All minima of s?/d? therefore occur strictly within the intervals [d,,, | 
and may be found by equating to zero the differential of s?/d? with regard to d, in which r(d) 
is treated as constant. 

The necessary condition for a minimum is 


n n 
2d= LV yif Unive (3) 
i=1 | i=1 


and here s*/d? has the value 
af 3 rf vt (3B rey | \n & val. (4) 


It follows that in each interval [d,,,] there is at most one stationary value. In the interval 
r(d)is constant, and insertion of the corresponding values of r; in (3) gives a value of d which 
may or may not lie in the interval [d,,,]. If d does lie in the interval, the stationary value 
of s?/d? is given by (4); if it does not, there is no stationary value in the interval. 

It is now possible to lay down a procedure which gives a list of all 2d we might suspect to be 
quanta, down to any predetermined small value 2d*. The procedure is lengthy, but is suitable 
for use with mechanical methods of computing; the calculations described below were carried 
out on Hollerith machines. The procedure is as follows: 

(a) Compute d,,, = y;/(2m+1) for i = 1, 2,...,n and m = 0,1,.... The calculation stops 
for each i at the first d;,,, less than d*. 

(b) Arrange the d,,, computed in (a) for all i and m in decreasing order of magnitude, down 
to the first d;,,, less than d*. 


im 


(c) Compute > y?. 
i=1 


n 
(d) Compute ¥ r;y; in each interval [d,,,]. In the first interval this sum is just y,,; its 
i=1 


value in the interval d,,, is y,, larger than its value in the preceding interval. Its values are 
therefore easily found in succession. 
n 
(e) Compute § r? in each interval [d,,,]. In the first interval the sum is 1, and its value in 
i=1 
the interval [d;,,] is (2m +1) larger than its value in the preceding interval 
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(f) Compute in each interval [d;,,], from (c), (d) and (3), the value of d at which s*/d? is 
stationary, if such a value exists. If the computed value lies in the interval, it is ‘accepted’. 

(g) If the value of d obtained in (f) is accepted, compute the corresponding s?/d? from (c), 
(d), (e) and (4). 

These calculations result in a list of stationary values of s?/d?, and the values of d at which 
they occur. Maxima and points of inflexion will occur in the list but can be neglected; the 
cusps which occur at d,,, do not appear in the list. If, of two minima, that corresponding to 
the smaller d is at the larger (or equal) value of s?/d?, it may generally be ignored. This is 
because small values of s?/d? are expected at small values of d even on the rectangular 
hypothesis. The list of minima may therefore generally be reduced to a list of successive 
absolute minima, in order of decreasing d. 

A short example of this calculation will now be given. Suppose five values are observed, 
1, 9, 10, 19 and 21, and that 3 is the value taken for d*. Once the d,,, are computed and 
ordered, the y; and (2m +1) appropriate to each interval [d;,,] may be given as in Table 1. 


um 


Table 1. Calculation of possible estimates for a quantum 














j | 
| n n | 
| dim | Yi (2m + 1) | Em | 2a | d |  — 8*/d® 
i=1 | i=1 
| | | 
| a | | | 
| ‘s 21 1 | 21 | 1 (23-43) -- 
| 40 19 1 40 2 12-30 0-294 
9 10 1 | 50 3 9-84 0-367 
- 9 ; ts 4 8-33 0-370 
6-8 | 21 3 | 80 | 7 (6-15) — 
| “2 19 | 3 99 10 4:97 0-032 
| 38 21 | 5 120 15 (4:10) = 
| 3-3 | 19 | 5 139 | 20 3-54 0-292 
| 3 | 10 | 3 149 | 23 3-30 0-350 
| | | | 








The running totals of the second and third columns appear as the fourth and fifth columns. 
The estimate of d given by (3) follows, the values not accepted (not lying within their 
appropriate intervals) being given in brackets. Finally, the corresponding value of s?/d? 
is given. 

The list of successive absolute minima of s?/d? has in this case only two entries, 0-294 and 
0-032. Of the five entries in the last column of Table 1, four are near the expected value } on 
the rectangular hypothesis. But at d = 4-97 the low value of 0-032 has been obtained for 
s*/d? which indicates that the observations are grouped about zero or integer multiples of 
2d, i.e. 0, 9-94 and 19-88. In this simple case, a value of 2d near 10 could have been deduced 
by inspection. An experimenter who made such observations would be liable to announce 
a quantum law, and to account for the small discrepancies from the law as experimental 
error. An estimator of the variance about the means was given in the previous paper 
(Table 2, Broadbent, 1955); for these data we estimate the standard deviation to be 0-9. 
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3. s*/d? ON THE RECTANGULAR HYPOTHESIS 


We return to the question asked by the experimenter in the first section, and now phrase it: 
‘By studying my observations I have found a possible quantum 2d such that s?/d? seems to 
me remarkably small for such a small 1/d. Is its value so small as to cast doubt on the 
rectangular hypothesis?’ 

We attempt to answer this question by comparing the experimenter’s values of s?/d? and 
1/d with values found in a sampling experiment in which the rectangular hypothesis held good. 
Two difficulties arise in making this comparison. 

The first difficulty is that points in the (s?/d, 1/d) plane cannot at present be completely 
ordered in their departure from the expected value on the rectangular hypothesis. Suppose 
the experimenter obtains the small values s7/d? and s3/d3 at 1/d, and 1/d, and that 1/d, < 1/d,. 
If s}/d? < s3/d3, he will certainly consider 2d, a more likely value for a quantum than 2d, for 
the reasons already given. But if s?/d? > s3/d3, he must choose on rather intuitive grounds 
which is the more likely quantum. For example, in Fig. 1, the successive absolute minima 
of s?/d? have been circled, and at each value below a conventional significance level the 
corresponding 2d might be chosen as quantum. In other words, each of the successive 
absolute minima (circled points in Fig. 1) corresponds to a possible candidate for considera- 
tion as a quantum. Is the ordinate at 1/d = 2-0 (where s*/d? = 0-22) more startling than the 
ordinate at the much larger 1/d = 80-1 (where s?/d? has the smaller value 0-13)? In the absence 
of knowledge of the distribution of such minima, and without adopting some principle of 
ordering, the question cannot be answered, and for this reason an exact test of significance 
of the rectangular hypothesis does not seem possible. However, a general comparison of the 
experimenter’s values with those obtained on the rectangular hypothesis may still be made. 

The second difficulty is that the exact distribution of successive minima on the rectangular 
hypothesis is not known, and the sampling experiment was limited in two ways. It was 
possible to take only a few values of n; this disadvantage is to some extent overcome later. 
Only a rectangular parent distribution has been used. Although the distributions of s?/6* for 
a unimodal and a rectangular parent distribution are very similar, it may not be true to say 
the same for the distributions of successive minima of s?/d?. In the comparison suggested 
below, the largest observation is an important statistic, and its distribution for the two 
types of parent is very different. The comparison described is therefore only valid when 
a rectangular parent distribution is a reasonable alternative to the quantum hypothesis, 
although it may be illuminating in other cases. 

The sampling experiment was carried out by the Mathematics Division of the National 
Physical Laboratory. For a given value of n, (n—1) numbers between 0 and 1 were taken 
from the Rand random numbers, and the number | added to the set. In this way sets were 
obtained with the same distribution as that given by the following procedure: take n random 
numbers distributed uniformly between 0 and a (0 <a<0oo) and divide each by the largest 
number taken. At first sight the two procedures appear different, but in each the ith ordered 
observation y; may be shown to have the distribution 


—2\ , E ; 
(n= 1)(7 1) wy dy, (0< p< Lyi = 1,2, 0.2), 


and similarly the joint distribution of the y; are the same in each case. Since s*/d? is dimen- 
sionless the actual value of a and the fact that the largest observation in the calculations is 
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always | are irrelevant in applications. That is, the distribution of s?/d? obtained in the 
sampling experiment is directly comparable with that of s?/d? obtained when the parent 
distribution is uniform between 0 and a. 

In the sampling experiment 100 sets for n = 5 and for n = 20 observations were produced, 
and 50 sets for n=10 and for n=50. For each set, the calculation described in §2 was 
completed and a table of successive absolute minima of s?/d? was formed. The small 
value d* was 0-01, i.e. possible quanta as small as 1/50 of the range of the observations were 
considered. The size of the calculations is indicated by the number of d;,, formed: 140,000. 

For one set of each sample size a complete table of accepted d and corresponding s*/d* was 
printed. One of these tables of stationary values of s?/d? is given in Fig. 1 as a graph of s?/d* 
against 1/d. The other three were of a similar nature. Four comments may be made: 

(i) For each sample size n, the density of stationary values seems to remain approximately 
constant as 1/d increases. 

(ii) The density of stationary values increases with sample size n. 

(iii) The range of the oscillations of s?/d? decreases as n increases. 

(iv) Cusps (possibly maxima) at d;,, have been ignored. 

The tables of successive minima have about the same average number of entries (5-5), 
i.e. there are nine successive minima in Fig. 1, but in the average sample only 5-5. The 
seatter diagrams in Figs. 2-5 show these entries as points in the (s?/d?, 1/d) plane. As 1/d 
increases, the values of the minima fall sharply at first and after that decrease slowly. Fig. 2 
shows that for n = 5 very small values of s?/d? are obtained even for small values of 1/d. The 
points circled in Fig. 2 are described below. It is therefore difficult (as might be expected) 
to substantiate a quantum hypothesis with as few as five observations; startling agreement 
with the hypothesis may be obtained, but if the experimenter has been free to select the 
quantum similar agreement may be obtained on the rectangular hypothesis. 

On the rectangular hypothesis, ,/n {4 — s?/d?} is approximately normally distributed, with 
mean zero and variance 4/45; it is independent of . Inspection of Figs. 3-5 suggests the 
following conjecture, that ,/n{4—s?/d?} is approximately independent of n and so depends 
on d only, provided that n is greater than 5. On this conjecture, we may use this statistic 
in a test of a quantum hypothesis for any » greater than 5, its distribution being obtained 
from the data of the sampling experiment. A large positive value of ,/n {4 — s?/d?} indicates 
that the observations are grouped about zero or integer multiples of 2d. Fig. 6 gives 
successive maxima of ,/n{}—s?/d?} for 50 sets of nm = 10, 20 and 50 observations (i.e. half 
the data of Fig. 4, and all Figs. 3 and 5 are represented in Fig. 6). 

Fig. 6 supports the conjecture in that the points for the three values of n appear quite well 
mixed. For 1/d > 50 there are 45, 45 and 50 points for n = 10, 20 and 50 respectively. The 
means of the respective ordinates are 0-69 (s.D. 0-028), 0-76 (s.D. 0-028) and 0-76 (s.D. 0-017). 
The largest difference between the three pairs of means is greater than zero by 1-8 times 
its S.D. 

The test of a quantum hypothesis proposed consists in comparing the values ,/n {4 — s?/d?} 
and y,/d obtained by the experimenter with Fig. 6 (y,, being equal to 1 in the sampling 
experiment). If the experimenter’s point appears consistent with the other points on the 
figure, there is no reason to doubt the rectangular hypothesis. If his point lies above and to 
the left of the others, it indicates that his quantum hypothesis is obeyed by the data more 
precisely than chance warrants on the rectangular hypothesis. As a rough guide it may be 
said that Jn{k —s?/d}>1 
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is the condition that the data must fulfil to validate the quantum hypothesis. In comparison, 
the 5, 1 and 0-1 % points of ,/n {4 — s?/d*} are about 0-49, 0-69 and 0-92, since its mean is zero 
and variance 4/45. When the experimenter is at liberty to choose the quantum, in order to 
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It will probably occur to the reader that a quantum hypothesis based on a single set of 
data could also be tested by a random division of the data into two sets. The first half of the 
data would be used to estimate a possible quantum, and this value would be tested with the 
second half of the data. Two difficulties arise in this treatment. The first is that two 
statisticians using this method might disagree in their judgement of the same data (because 
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it had been divided in different ways), although of course their probabilities of error of the 
first and second kinds are the same. The second difficulty is that the decision, arrived at 
after inspecting all the data, to test for the existence of a quantum will entail a rough guess 
at the value of the quantum. This guess will generally influence the estimation of the 
quantum from half the data, for the estimation generally requires knowledge of the 
coefficients 7; or the ability to decide which modes in the data are spurious. A quantum 
deduced in this way will generally not be independent of the second half of the data. The 
division by Prof. Thom of Druid Circles into English and Scottish circles is an example of 
the second difficulty of this treatment of data. 

In general terms, even when a quantum hypothesis is true, it is unlikely to be confirmed 
by a single set of data unless the s.D. of €; is small in relation to 6 and n is large. And in these 
conditions the hypothesis will hardly require formal justification. In other cases Fig. 6 
shows that it is difficult to distinguish between a genuine quantum ‘law’ and a spurious 
quantum fitted to data. Further independent observations may then be the only way of 
strengthening belief in a quantum hypothesis. 


4, AN EXAMPLE OF A QUANTUM HYPOTHESIS 


After surveying the data of excitation energies of nuclei, Grant (1952) proposed the quantum 
hypothesis that the observed energy levels were integer multiples of a particular energy for 
each nucleus. Accurate measurements of two or more energy levels were available for about 
fifty nuclei; Grant gave the probability of the random occurrence of the observed integral 
sequences as approximately 10~-*°, in other words he believed the data were overwhelmingly 
in favour of the hypothesis. 

The five largest numbers of observations made on individual nuclei were with n = 7, 8, 
10, 10 and 12; the observations} (y;) are given in Table 2. The quanta proposed by Grant 
are based on inspection of the data. Making use of the vectors r(d) which follow from these 
quanta, estimates of 2d were recalculated by (3) and the corresponding s?/d? were calculated 
by (4). It will now be considered whether the data of Table 2 tend to disprove the rectangular 
hypothesis. 

We first notice that the agreement between columns y; and 2r,;d for each nucleus appears 
intuitively good, and the values of s?/d? calculated appear to be well below 4. However, two 
of these values are certainly not significant (for §{Kr and 45N), i.e. for these observations 
n{} —s?/d?} is below the 5% level of ,/n{4—s?/d?} on the rectangular hypothesis. The 
remaining three would uncritically be judged significant. 

In Fig. 7 the data are represented on a graph similar to Fig. 6. On this figure are drawn 
also very approximate curves for the mean and upper 5 and 1 % points for the minima 
plotted on Fig. 6 (for reasons already given these curves can only be rough indications of 
significance levels). It now appears that the two agreements for the two nuclei already 
mentioned are in fact rather worse than one would expect on the rectangular hypothesis, 
two more (120 and 4B) appear to be typical, and one only (74{Ra C’) can be judged significant. 

Similar conclusions can be drawn from the following comparison. In the last row of 
Table 2 are given the reported standard deviations of the observations, and in the row above 
the estimated standard deviations about the quanta calculated. For the first four nuclei, 
the reported standard deviation is too small to account for the variation actually found. 
For 744RaC’ the two are equal. 


+ Data privately communicated to the author. 
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For reasons already given, the remainder of the data is not sufficiently numerous to throw 
doubt on the rectangular hypothesis. For example, the seven sets of n = 5 observations gave 
the points circled in Fig. 2 on the (s?/d?, 1/d) diagram. Although some low values of s?/d? 
were obtained, it was hardly possible to get values lower than those found on the rectangular 


hypothesis. 


Nucleus 


No. of 


observations ... | 
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Tt 
ger "2N ‘3B 430 *uRaC’ 
” . a a on 
7 8 10 | 10 12 
| | 
| | | 
T Pe ee ee | a 
Yi 2r;d Yi 2r,d Yi 2rd | ¥; 2r;,d Yi 2r,d 
0-550 0-542 | 5-276 | 5-338 | 2-141 | 2-144 | 0-875 | 0-635 | 0-608 | 0-612 
0-610 | 0-619 | 5-305 5338 | 4-457 | 4-467 | 3-02 | 3-176 | 1-283 1-286 | 
0-689 | 0-697 6-328 | 6-322 | 5-033 | 5-003 | 3-88 | 3-811 | 1-412 | 1-409 
0-768 | 0-774 | 7-164 | 7-164 6-752 6-789 | 4:54 4-446 | 1-663 | 1-353 
| | | 
0-825 | 0-851 | 7-309 | 7-305 | 6-802 | 6-789 | 5:17 | 5-081 | 1-844 | 1-837 | 
1-038 | 1-006 | 8315 8-288 | 7-297 | 7-325 | 5-72 | 5-716 | 2-015 | 2-021 | 
1-315 | 1-316 9-156 | 9-131 | 8-565 | 8-576 6-33 | 6-352 | 2-138 | 2-143 | 
— — 10-816 |10-817 | 8-921 | 8-934 | 6-93 | 6-987 | 2-268 | 2-266 | 
| | | 
-- - - 9-185 9-112 | 7-60 | 7-622 | 2-439 | 2-450 | 
- 9-272 | 9-291 | 823 | 8-257 | 2-513 | 2-511 | 
= o - — : — | 2-697 | 2-695 | 
— — — _ — — — — | 2-880 | 2-878 | 
Phe ae mal ae ie a Pees ees et 
| 
0-07739 0-14048 0-17867 0-63515 0-06124 | 
0-1864 0-1612 0-1169 0-1072 0-0331 
0-3887 0-4867 0-6844 0-7194 1-0401 
34-0 154-0 103°7 25-9 94-1 | 
0-017 0-029 0-031 0-104 0-006 | 
| 0-003 0-006 0-008 0-020 0-006 | 
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By itself the nucleus *}{Ra C’ might be considered to validate a quantum hypothesis, but 
as a whole that data must be thought of as consistent with the rectangular hypotaesis. 


Part of this paper was written at the Atomic Energy Research Establishment, Harwell, 
during a vacation consultancy, and I am indebted to the Establishment, particularly to 
Dr J. Howlett, for supporting the work of calculation. I am indebted to Mr C. W. Nott, of 
the Mathematics Division of the National Physical Laboratory, who carried out the calcula- 
tions on the sampling experiment. The paper has benefited from my discussions with 


Prof. G. A. Barnard, and is published by permission of the British Coal Utilization Research 
Association. 


REFERENCES 


BROADBENT, 8. R. (1955). Biometrika, 42, 45. 

FisHEerR, R. A. (1924). J. R. Statist. Soc. 87, 442. 

Grant, P. J. (1952). Proc. Phys. Soc. Lond. A, 65, 150. 

Irwin, J. O. (1956). J. R. Statist. Soc. B (in the Press). 
KENDALL, M. G. (1946). The Advanced Theory of Statistics, 2, 435. 
Rupra, A. (1955). Sankhyd, 15, 9. 

Tuom, A. (1955). J. R. Statist. Soc. A, 118, 275. 





———oao 


Se 


anf oat Fh 





ut 





——ee 


[ 45 | 


THE NUMBER OF NEW SPECIES, AND THE INCREASE IN 
POPULATION COVERAGE, WHEN A SAMPLE IS INCREASED 


By I. J. GOOD anp G. H. TOULMIN 


A sample of size N is drawn at random from a population of animals of various species. 
Methods are given for estimating, knowing only the contents of this sample, the number of 
species which will be represented r times in a second sample of size AN; these also enable us 
to estimate the number of different species and the proportion of the whole population 
represented in the second sample. A formula is found for the variance of the estimate; when 
A> 2, this variance becomes in general very large, so that the estimate is useless without 
some modification. This difficulty can be partly overcome, at least for A < 5, by using Euler’s 
method with a suitable parameter or the methods described by Shanks (1955) to hasten the 
convergence of the series by which the estimate is expressed. The methods are applied to 
samples of words from Our Mutual Friend, to an entomological sample, and to a sample of 
nouns from Macaulay’s essay on Bacon. 


1. INTRODUCTION 


We present here a further development of the theory expounded by Good (1953); that paper 
will be referred to, for brevity, by the letter G throughout. 

We imagine a random sample of size NV, the basic sample, to be drawn from an infinite 
population of animals of various species, and suppose that n, distinct species are each 
represented exactly r times in the sample, so that 


, rm, = N. (1) 
=1 


r= 
We write d= S n 

r=1 
the total number of distinct species in the sample. It is convenient (though, as was pointed 
out in G, not essential) to suppose that the total number of distinct species in the population 
is a known finite number s, so that we can calculate 


Ny = s—d, (2) 
the number of species not represented in the sample. If the actual value of s is not known, 
all our results will remain true if it is arbitrarily assumed to be any sufficiently large number. 
As in G (p. 237), the larger n, is, the more applicable our results are. In G it was shown that 
certain properties of the population could be deduced approximately from the sample 
frequencies n,; in particular, the total coverage of the sample (i.e. the proportion of the 
population represented in the sample, which is the sum of the population frequencies p, of 
the species represented) is approximately 


N |W’ (3) 
rovided n, is large (G, formula (9)). 
+ &(n,) is the expected value of the random variable n, when our basic sample of N specimens is 


taken at random. We shall use the same symbol n, both for this random variable and for a particular 
value of it. 
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We now contemplate taking a second sample, of size AN. We describe this as the ‘second 
sample’, even though it may be (and in practice probably will be) an enlargement of the 
basic sample; in this case, of course, A > 1. If the second sample is not an enlargement of the 
basic sample, it will be termed independent; this word may be interpreted in its probabilistic 
sense, provided that the true statistical hypothesis specifying the population frequencies 
is momentarily regarded as ‘given’. Except in §4 our results apply to both enlargements 
and independent second samples. 

We may now wish, for example: 

(a) To find the expected coverage of the second sample. 

(b) To find the expected number of distinct species in the second sample. 

(c) To find (roughly) the variances of estimates of population parameters which might be 
made from the second sample. 

(d) To estimate the term &,y(n2,|H) in formula (22) of G for the variance of 7,. 

Results of this type may enable us to decide whether it is worth enlarging our sample, and 
to what extent, depending on the purposes for which it is required. 

For example, consider a teacher of languages who wishes to base his teaching on the 
population frequencies of words. He will wish to estimate what size of vocabulary should 
be learnt by a student in order to decrease the need for reference to a dictionary below 
a certain frequency. It was shown in G how a sample can be used to make such an estimate. 
The present paper shows in what way the sample can also be used in order to help the 
decision of whether to carry out more sampling. For instance, in example (iii) of §6 below, 
the 2048 words of the basic sample had an expected coverage of 87-3 %, and we find that if 
the sample size were doubled, then the same expected coverage could be obtained by 
selecting only 1780 words. More work by the teacher means less for the student. 

Similarly, an entomologist will often want to know whether to increase a basic sample, 
and will be able to base his decision largely on the expected number of new species that will 
be provided by a given amount of sampling. Example (ii) of §6 is an instance of such an 
application. 

Let n,(A) be the random variable whose value is the number of distinct species represented 
exactly r times in the second sample.t 

We first consider a method that may appeal to statisticians who are accustomed to fit 
distributions by the method of moments. The method will not, however, be used in our 
examples if only because of the enormous amount of calculation that it requires. We begin 
by stating a lemma that is presumably well known, although we cannot give a reference. 


Lemma. (Determination of a set of numbers whose ‘factorial moments’ are specified.) If 


b= > ra, (¢ =0,1,2,...), (4) 
r=0 
= 22 (—1)°b,,, r 
then a, == Pee oe (5) 


at any rate if a, = 0 for all sufficiently large r.{ Problems (a), (b) and (d) above reduce to the 
estimation of the numbers &(n,(A)) for certain values of r when values of n,(1) = n, are 


+ It is convenient here to depart slightly from the notation of G. The number which we write as 
&(n,(A)) would there have been denoted by @,y(n,). Note that in the case of an enlarged sample, 
n,(A) is to be considered as varying as the whole enlarged sample is varied at random, not merely the 
(A—1) N additional specimens, so that this correspondence of notation still holds. 

{ For a proof under more general conditions, see the Appendix p. 62 below. 
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observed; the results are given in equations (22), (23) and (32), respectively. The same is 
true of problem (c) when the variances concerned can be expressed in terms of the n,(A); 
thus equations (30A) and (31) of G show that this is the case for Yule’s ‘characteristic’, 


8 
which is an estimate of > pi’. 
Bp=1 





Ph 
Now Vo = ron, 
8 
is an unbiased estimate of c; = > pi, (see, for example, G, p. 245). Similarly 
p=1 
20 ] - $ 
é ANF a n,(A ) Ba X Pir (6) 


It may therefore seem reasonable to assume 





S x6 (n,(a)) = AN” 


; a 
- J We ¥ rn, |= > Hn,| (7) 
r=0 rat 


r=0 
and then to solve for &(n,(A)) by using the lemma. 

In spite of the theoretical interest of this method it seems likely that it is not really 
adequate. For it depends too much on the estimates of the higher population moments, ¢,, 
and these estimates are subject to large sampling errors. We have therefore not investigated 
any numerical examples. Instead of proceeding via the factorial moments, we find directly 
the following relation between &(n,(A)) and the &(n,): 


YQ ye.) (8) 
for any integer r > 0 (§2, equation (16)). If we assume that 


E (2,45) = Np; (9) 


or E (0,4) = Nis (10) 


where the numbers nj, 23, 23, ... are obtained by smoothing the numbers 7,, N9, ns, ... (see G, 
§§3, 7, 8), we can estimate the values of &(n,(A)). 
We point out here that the series (8) is not really infinite: for 


E(n,.;) = 9 whenever r+i>N, (11) 


and the upper limit of summation could therefore be replaced by N —r. If A> 2, a practical 
difficulty arises, in that the factor (A—1)! increases rapidly with 7, and so attaches great 
weight to terms for which &(n,,;) is small and therefore is liable to a large percentage error 
when estimated from the basic sample. It seems to be practicable to overcome this difficulty, 
at least for moderate values of A (say A<5), by using a summation method to make the 
series (8) converge rapidly; it is shown in §5 that Euler’s method with a suitably chosen 
parameter g is convenient. We have not, however, been able to justify this procedure by 
finding a useful error term for the partial sums of the new series obtained. 

We mention here two possibilities which we have not investigated practically. First, it 
may be possible to reach larger values of A in two (or more) stages: e.g. to estimate &(n,(4)) 
we might, instead of using (8) directly with A = 4, first estimate &(n,(2)), &(n,(2)), 
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&(n,(2)),..., then smooth the values obtained, and again apply (8) with A = 2 to estimate 
&(n,(4)). Secondly, (8) with 1/A in place of A and n,, n,(A) interchanged might be used as 
a check on the results obtained, and this might provide a new method of smoothing. 


We are much indebted to Dr J. Wishart for several suggestions and corrections. 


2. ESTIMATION OF &(n,(A)) 


Let p,(u = 1,2,...,8) be the population frequencies of the s species. As in G, equation (10), 


8 (N ; 
(m,) = & (7) rip). (12) 
#=1 \ 
(In G, the left-hand side is written as &y(n,|H). As explained above, we use the symbol n, 
ouly with reference to the basic sample, of size N; we omit the H, which refers to the 
hypothesis that the population frequencies are {p,,}, because we shall not be concerned with 
expectations on any other hypothesis.) For the second sample, we have similarly, assuming 
P, <4 for all p, 
8 (AN 
E(n,(A))= Ul | Pl —p,y 


#=1 


tia rr | p. \-A-DN 
is 2 ( r )oxa —p,)" aa (1 -£ =) 
A= 
s (AN ' oO /_ (A * 1) N | | 
= vy mr ae: [a + : : 
#=1 ( y PAL Px) Pia ( t PAL P,) (13) 
= . 


—(A-1)N\ & ., ei 
( ‘ ) UPL pyrer 


a ~= 


v 


~ ~ E(n,,;). (14) 
~ 


ocar 
ee 


r+i 
corresponding terms of the series are indeterminate. We notice, however, that if the infinite 
upper limit for 7 is replaced by an odd [even] integer, the left-hand side of (13) is greater 
[less] than the right-hand side, and the same therefore holds of (14). Thus the partial sums 
of (14) are alternately greater and less than the left-hand side, and in all practical examples 
a sufficiently good approximation is reached while (r+7) is still small compared to VN. 
Provided that we use only terms of the series for which r+i<N and i<(A—1)N, we can 


, ‘ N ‘ , 
(14) is not rigorously correct, since for r+i>N, ( ) and &(n,,;) both vanish, and the 





write we! eta 
( 7 ( a __ (AN) (—(A—1) Ni (r +7)! 
ams qe) eal plat Nett 
r+i : 
a (-ara—("F'). (15) 
Hence sm ayax 3 (-0(") a-6md, (16) 


the partial sums erring alternately in excess and defect. 


+ This follows from the nth Mean Value Theorem applied to (1+2)-Q-)¥, 











ate 


3) 
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(16) can also be obtained directly by using the Poisson approximation 


&(n,)= ere APs) (17) 
n=l r! 
We define an estimate of &(n,(A)) by 
@,(A) = ar SS (—1)! (*") (A=1ins3 (18) 
i=0 
then (16) gives us E(2,(A)) =E(n,(A)). 


For the case r = 0, we may seem to need to assume the value of s, but this assumption is 
not really required since we can write 


da) = d— 3 (-1)'(A-1)'n, = 8— f(A), (19) 
i=1 
so that &(d(A))=6(d(A)). 


We have thus obtained (approximately) unbiased estimates of &(n,(A)) and &(d(A)) in terms 
of the observed numbers n,. We shall almost certainly obtain more accurate estimates if we 
replace the n, by smoothed values n;; for methods of smoothing see G, §§3, 7, 8. 

In the case when the second sample is an enlargement of the basic one, and it is desired 
to predict the value of n,(A) given the basic sample rather than &(n,(A)), which we have defined 
to be the expectation when the whole of the second sample is varied at random, it seems 
intuitively clear that n, should not be replaced by n;. in (18), at any rate when A is not large, 
though the later terms probably should be smoothed; for instance, consider the case A = 1. 
We have not attempted a rigorous treatment of this question. The advantage of smoothing 
is not as great when using formulae like (18) as when using such formulae as G (2’): 


r* = (r+1)n,,4/N,, 


involving a ratio of the n;. Sometimes a ratio is involved surreptitiously, as in G (6), (6’). 
An important point in the argument leading to (14) was the expression of (1 —p,) as 


so that expansion led to terms expressible as functions of the &(n,). This device can also be 
used to avoid the approximation made in G (lines 8 and 9 of p. 241) of replacing the expected 
value of n,,,,, for a sample of size N +m by its expected value for a sample of size N. The 
result obtained is (using the notation of G) 


; (r+my™ 1 & (r+m+if (—m\ , 
E(qr" | H) = (N—r)™ &(n,) = (N—r—m)\ i E (Mp. m+i) (20) 
, (r+m+1)m , 
(r+ myn Erm) — Arp mE leremen) +--+ 


= (Wry Fw) si 





The approximations made in G are thus equivalent to replacing (V—r)™ by N™ and 
neglecting the terms of the sum after the first; they are reasonable provided mr< WN. 
4 Biom. 43 
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As noted in §1, we may be particularly interested in the coverage of the second sample, 
and the number of different species it will contain. By (3) and (16) with r = 1, the expected 
coverage is approximately 





E(n (A 12 nae , 
Way ala (= WG + A= 1 Elm.) 
21-5 [m—2A= 1) mg + (A= 1)? ny...) (22) 


(or, more accurately, the same formula with n; in place of n,).{ The expected number of 
distinct species represented is (by (19)) approximately 


d+ (A—1)n,—(A—1)? ng4...; (23) 
i.e. in the case of an enlarged sample, the number of new species expected is approximately 
(A—1)n,—(A—1)? ng4.... (24) 


Evidently n,, 7, ... may be replaced by smoothed values in (23) and (24), but d should be 
replaced by the smoothed value ’ F P 
d’ =n,+N3+... 
only in the case of an independent second sample. Note that (23) and (24) can be proved 

directly without assuming s to be finite. 


3. VARIANCE OF THE ESTIMATES 7,(A) 


In this section we find an expression for the variance of the estimate 7,(A) of &(n,(A)) defined 
by (18). This must not be confused with the variance of n,(A), which can be found from the 
formulae given in G, §5. 7,(A) is a linear function of the random variables n,.,;, and varies 
accordingly when we take different basic samples; we can find its variance if we know the 
variances and covariances of the n,. We therefore start by calculating these. By the method 
of G, $5, we find 

Nt 


E(n, m,) = 6, 26 (M, Mr 3l(N— —s)! f = PL. PH(1 —Py --— 

= 6,,E(n,)+ m [EE pv, ps(l—p,—p,%7-8— E prt4(1— 2p, 4] 

+ tel(N—r— 8)! 7? Biv # v : h P q 
(25) 
where 6,, = 1, 6,, = Oif r+s. 
Now 
N-r-s 
(1 a ae p,)” or ae -|a —p,)(Q -p,)(1- redo Py Pe) 
wl Py 





” Py p, \" ~~ 2 
1— ril i= 1 Lok... 
(1—P,) ( 2 ] (1p, (14; al am ( 


lg 1—p,1—p, 
> (;) 0-2, siti z Cena i 


i=0 


N-r-s N e 
«S (1p “Cd pht-p,)-*ek-p,% (26) 
k=0 


+ It is correct to replace n, by nj, even when the second sample is an enlargement of the basic one, 
because the more accurate formula for the coverage, G (9’), uses nj in place of n,, so we are interested 
in &(n,(A)) rather than the expected value of n,(A) given the basic sample. 
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N Ar 
and (1— 2p, = [1—p,)(1 Si 
N-1—8 .(N-—r-—s\ , ‘ . 
= = (- 1yé ( ; )ria = gyre. (27) 


Substituting from (26) and (27) in (25), 


N! _[s\ (r\ (N-r- 
E(n,N,) = 6,¢6 (n,) + r! s!(N—r ay —s)! , x ( Ss 1)k (;) (;) ( M ‘) 
* -aF (Upeern(h “ers 
N Pecan : : 
~2(- 1)é ( pe ‘) pea —p peed 
i 


mL ye OCT) 


eee 
i a é 


=~ (=D Else) 
(esaad 





O(N +i+k) 6 (Mer j+4) 


r+s+i 

: : (N-r-i-B!(N—s—j—B)!(r+i +B! (s+j+b)! 

= 6) — k ——————— ————— eee 
n(n) + B (—V) (N—r—s—h)! Nliljl kl (e—a)l(r—3)! 

(r+s+2)! 
ris 3! 4! 





x E (N+ i+4) é (54-54%) re x ( a 1)" E(Ny+9+) (28) 


py (N71) (N — 8 —j—B)! (r +i +b)! (8 +5 +4)! 


= FE (n,)+ B — ) (N-r—s—k)! Nil j! k! (s—i)! (r—j)! 
8s)! , 
x Eh sigt) B(gajyn)— 27 Fl (2), (29) 


using (8). Provided i, j, k, r, s are all <N, the coefficient in the first sum is O((rs/N)**/+*); 
and when i = j = k = 0, use of Stirling’s formula shows that it is 1+O(rs/N). Hence, if 


rs<N, cov (n,,n,) = &(n,n,) —&(n,) &(n,) 
=6,,6(n,)—2-?-* ("**) 6@,.02. (30) 


Notice that when r = s we have equation (22) of G: 


2 
Vin) 28 (n,)—2- (°7) 62), (31) 
or, expanding the second term by (8), 


2 
V(n,)=6(n,)—¥ (— 1 ging.) (32) 


For the case r = 0, since s is constant, we have 
V(d) = V(n 9) =&(d(2)) —&(d)=E(n,) - E (nq) + .... (33) 


Using (30), we can now find the variance of 7,(A). From the elementary formula for the 


variance of a linear form: 
Vda, 2) = hai a, COV (x;,%;), 


fod e~ 
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we have 
V(A,(A)) bvenr by 
=A (=H (A= 1) (>) (77) cov (aires) 
i,j=0 r ’ 
r = i+j i+j r+t rt+j e 9—2r—i—j 2r+ttj : y 
es) »\ 2 
=a] 3 aay) tad 
2 ites tre bigeniady 
a. _ 1¥ re 1 9—2r—l & oa) |) he ; a eee. 
= | 1) (A 1) -” E (Noy. )) rir! feo i!j! 
< ay (ti, 0 5 (2r+0)! , 
=a] 3 a— oe (F') 6(a,.9— & (Wa 12” SE wing l2) 
(e) 7\ 2 oy, 
arr] 3 a0 ("*') e(m,.d-(77) Cay eing2)], (34) 


using (8) again. This derivation of (34) is slightly unsatisfactory, since the second series in 
the previous formula may give a good approximation to the term by which we replace it 
only after so many terms have been taken that the approximation made in (30) is no longer 
valid. It may be possible to postpone making this approximation until a later stage of the 
calculation, but the algebra would become very heavy. In order to estimate &(n,,(2A)) in 
calculating (34) for an actual case, it may be possible to use the method of §2 (probably in 
conjunction with the summation technique described in §5), or it may be easier to make 
a sufficiently accurate guess. 

If 7,(A) is defined by (18) with nj; in place of n,,;, it becomes very difficult to make any 
estimate of its variance. We can say, however, that so long as we feel that it is worth using 
smoothed values at all, the variance of the estimate based on them is likely to be con- 
siderably less than that given by (34). 


4. VARIANCE OF 7,(A) CONSIDERED AS A PREDICTION OF 7,(A) 


In the last section, we were considering the question: How much may 7,(A) be expected to 
differ from its mean value (which is equal to &(n,(A)))? A question which may sometimes be 
more relevant is: How much may 7i,(A) be expected to differ from the value of n,(A) obtained 
in a random second sample? To answer this question, we want to find 


V(n,(A)—n,(A)), (35) 
which may be called the variance of 7i,(A) considered as a prediction of the random variable 
n,(A), rather than as an estimate of the parameter &(n,(A)). It is evidently now necessary 
to consider separately the cases when the second sample is independent, and when it is an 
enlargement of the basic sample. 

When the second sample is independent, %,(A) and n,(A) are independent random variables, 
_ V(A,(A) —n,(A)) = V(A,(A)) + V(n,(a)) (36) 


which can be calculated by using (34) and the following modification of (31): 


V ny(A))=6 (m(A)) -2-* (77) & (ny (2A) (37) 
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In the case when the second sample is an enlargement, 7,(A) and n,(A) are correlated, and 
we have not been able to calculate (35) in this case. It may be expected to be considerably 
smaller than in the case of independence, at least if A is not large, since when A = 1 we have 
n,(1) = n, = n,(1) and so (35) is reduced to zero. 


5. SUMMATION OF THE SERIES OBTAINED IN §§2 AND 3 


We consider the case of the general series (18); similar remarks apply to (22), (23), (24) and 
(32), and to the series arising in calculating (34) and (37). It was pointed out in §1 that the 
term (A — 1)‘ in (8) and (18) may cause trouble if A > 2. In fact, the series is likely to become 
‘practically divergent’; i.e. to behave like an infinitely oscillating series up to a point at 
which we become too uncertain of the value of &(n,) to continue with the calculation. This 
difficulty is illustrated by formula (34) for the variance of %,(A); if A> 2, the series 

y (‘Fy a- 1) &(m,4) 

i=0\ 7 
is likely to have a very large sum, unless 4(n,..;) decreases extremely fast. It is natural to 
try to overcome this difficulty by using a method of summation which is known to make 
some oscillating series converge; a convenient method appears to be that of Euler, with 
a parameter q, ——_ called the (Z,q) method (Hardy, 1949, pp. 178 ff.). This is to 


transform the ie S a; into 5 a®, where 


= j=0 
1 rw 

a® op ee y (:) i-iq, 38 

r= q+ it coli)? ii 
re . 7 = j+1 j ¥ 1 aa 
=(-1(-4) Aj|(-—1) al’ (39) 

the forward difference symbol A/ being defined inductively by 

May = On1—a, Ab = MAL, (40) 


(The form (39) is given by Bromwich (1926), pp. 62—6, for the case g = 1. It leads to a con- 
venient method of setting out the work in a practical example, which will be illustrated in 
Example (i) of the next section.) If ~ a; converges, then ua converges to the same sum, 


for any q > 0; if = ae? converges, then > a® converges to ie same sum for all q>q’ (Hardy, 


1949, Theorems 117 and 118). 
In practical examples, n,.,; generally decreases slowly after the first few terms, and we are 


; . v). : 
usually interested in small values of 1, so that - ) increases slowly. Under these circum- 


stances, the series (18) is, after the first few terms, nearly a G.P. with ratio —(A—1). Now, 
if we apply the (#,q) method to such a G.P., say 


a; = (—1)'(A— 1)‘ do, (41) 
we obtain a? = @ ae bo (; ‘ya #(-—(A-1))! 
a F 
= a gence 


=- “(7 Hey (42) 
a qt+l . 
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q—(A-1) 
+1 

is A—1, which reduces all but the first term of the transformed series to zero.} If the n,,; 

decrease fairly rapidly, we may get better results by choosing g somewhat smaller. (This 

is the case in Example (i) of §6, where A = 5, but we take g = 2.) When r = 0 or I, and if 

N,> Ng, it may be worth taking out the first term of the series, and applying the summation 

process to the remainder; the reason for this can be seen by considering such a series as 


i.e. the transformed series is a G.P. with ratio . Clearly the best value of q to select 


1—4+}-$+.... (43) 
If we apply the (HZ, 2) method directly, we get 
0-333 + 0-208 + 0-139 + 0-093 +..., (44) 
while if we take out the first term and apply the (Z, 2) method to the remainder, we get 
1-000 — 0-042 + 0-000 + 0-000 + ..., (45) 


which is evidently better. 

When we have chosen a method of summation, and selected a partial sum of the trans- 
formed series as probably giving a sufficiently good approximation to the final sum, we can 
express this partial sum asa linear combination of the n,, and deduce its variance, as in §3.{ 
But there is now a new source of error, namely, the omission of the rest of the transformed 
series. We have not been able to find a useful form of error term for this remainder (corre- 
sponding to the statement that alternate partial sums of (8) err in excess and defect); failing 
such an error term, our results must be used with caution when it is necessary to apply the 
summation process. If the n,’s decrease slowly and q is taken to be slightly smaller than 
A—1, the transformed series will generally have terms alternating in sign (cf. equation 
(42)); it might then be hoped that the partial sums err alternately in excess and defect, but 
it does not seem to be possible to lay down any simple general conditions under which this 
is the case. 

Some of the methods described by Shanks (1955) also seem to be very well suited to our 
case; they have the property of summing perfectly any series which is geometric from some 
point onwards, so that the difficulty caused by an excessively large first term, noted above, 


oo 
does not arise. Given the series 5 a,, we define a sequence (not a new series) by 
n=0 


n-1 az 
us (nw = 1, 2,3, ...); 


r=0 An+1—- A 


repetition of the process gives a sequence C,,, and so on. The e, method consists of considering 
n 

the sequence B,, (in place of the sequence of partial sums A, = > a,), the e? method, of 
r=0 


considering the sequence C,,, and so on; the 2, method consists of considering the sequence 
Ag, B,, Cy, .... For an example, see §6, Example (i). 


+ This statement may appear to conflict with the remark of Hardy (1949), p. 180, that ‘as q 
increases, the (Z, q) methods form a scale of increasing strength’. But here ‘strength’ refers only to 
whether we obtain a convergent series or not: if we choose g unnecessarily large, we shall certainly 
obtain a convergent series, but it will converge very slowly. 

{ This remark applies to any method of summation by a linear transformation of the series, 
e.g. Cesaro means, the composite (H, q; C, k) method, any Hausdorff means, Hélder means, Hutton’s 
method, any Nérlund means, or quasi-Hausdorff transformations; for references to all these methods, 
see Hardy (1949), p. 392. It does not apply to the non-linear methods of Shanks (1955), mentioned 
below. 
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6. EXAMPLES 


The first example is an artificial one designed to test the efficacy of the methods described 
above, especially the summation methods of §5. The second and third examples illustrate 
the practical applications, but enlarged samples are not available for verifying the estimates. 

Example (i). Sample of words from ‘Our Mutual Friend’ by Charles Dickens. The following 
samples were taken: 


A, of 1000 words, the last words of lines on pages= 5 mod 25, 
B, of 2000 words, the last words of lines on pages= 10 or 20 mod 25, 
C, of 2000 words, the last words of lines on pages= 15 or 25 mod 25, 


the sampling in each case being carried as far as required to make up the prescribed number 
of words. Our original intention was to use A as the basic sample (NV = 1000) ard to calculate 





























Table 1 
Sample A; N = 1000 Sample A; N= 1000 
| 

r Np Ny r Ny ny. 
1 404 | 404 6 3 ae 
2 57 | 64 7 0 a 
3 | 24 25 8 3 —_ 
4 16 12-2 >9 15 = 
5 6 6-2 

d = 528 











values of d(A) and 7,(A) given by (19) and (18) for A = 2, 3, 4, 5, which could be checked 
against the values of d(A) and n,(A) actually obtained from the samples B, A+B, B+C, 
A+B+C. The results, however, showed a systematic and, for d(2), significant difference 
between the prediction and the observed result. Working back from sample B with A = 3, 
it appeared that sample A had n, considerably too small. We believe that this is due to the 
fact that the method of sampling used was not sufficiently random; an uncommon word 
is likely to occur several times on the same page, where a particular topic is discussed, and 
such a word is therefore less likely to occur just once in a sample selected as described than 
in a random sample of the same size. 

The results for larger values of A were, however, not much less accurate than those for 
A = 2, and we give the calculation of d(5) as an example of the use of the (Z,q) method of 
summation described in §5. Table 1 shows the data; the ; were obtained by graphical 
smoothing of ,/n,. Our formula (19) gives us 


d(5) = 528+ (4.404 — 42.644 43, 25—44,12-2445.6-2—...), (46) 


+ Consider the extreme case when p. 1 reads ‘one one one...’, p. 2 “Two two two...’, and so on. 
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To transform the bracketed series we form the difference table suggested by (39) (q has 
been chosen as 2, so as to make the differences small): 


$.4.404 =808  .., 

(4)?.47.64 =256 ., 496 4. 
(4)?.49.25 =200 , 51 4, 402 
(4)4.44. 12-2195 _* a 
(4)°.45.6-2 =198 a5 555 


(We apply the usual check, that the sum of each column is equal to the difference between 
the top and bottom of the one before.) The transformed series, by (39), is 


2. 808 + (2)?. 552 + (2)3. 496 + (2). 445 + (2). 402... 538 + 2454 147488453... (47) 


The last few terms of (47) are approximately a geometric series with ratio 0-6; the sum of 
the remaining terms should therefore be approximately+ 
0-6 
1—0-6 





= 19, 


making a total of 1150; hence 
d(5) = 528+1150 = 1678. (48) 


Applying the methods of Shanks (1955), described at the end of §5, we get Table 2. 
Although the transformed sequences are rather short, it looks as if C, = 1155 is a good 
approximation to the limit, giving d(5) = 1683. In fact, for the whole sample A+ B+C, 
d(5) = 1832. 














Table 2 

n | A, B, C. 
— —E | — —— — — 

0 | 1616 

1 592 1216 

2 2192 | 1134 1155 

3 —931 | 1162 

4 5418 





In this example, we have been slightly handicapped by having so few terms of the series 
available; when using the (#,k) method, this renders the remainder somewhat uncertain, 
and prevents us from omitting the first term from the summation process, as was suggested 
in §5. Because of this difficulty and the apparent non-randomness noted above, we use 
sample B (N = 2000) as the basic sample for a more comprehensive test of our methods, 
although we can then verify the results only up to A = 2-5. (The ‘second samples’ for 
A = 1-5, 2-0, 2-5 are A+ B, B+ C, A+ B+C respectively, and are thus all enlargements of 
the basic sample.) Table 3 gives the data for this basic sample; the nj) were produced 


+ This is, as Shanks (1955) points out, equivalent to applying the e, method to sum (47). 
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by smoothing ./n, graphically by the use of French curves, the n;, independently, by 
smoothing by eye: d(A), 2,(A), and the estimated percentage coverage, 100(1—,(Aj/AN), 
were calculated for A = 1-5, 2-0, 2-5, using the three sets of values. The summation process 
was used only in the case A = 2-5, with q = 1. Table 4 shows the three sets of estimates 
and the actual results found in the enlarged samples; standard deviations are given where 
applicable, calculated from (31), (33), and (34). It will be noticed that in this case 
little or nothing was gained when smoothed values were used; but it would probably 
be essential to use smoothed values when working with larger values or 2. 


Table 3 





Sample B; N = 2000 





























r | nN, nj. ny 
thst ial 7 | 
1 729 729 | 729 
2 108 96 | 110 
3 33 38 | 38 
4 23 21 19 
5 iy 14 13 
6 7 9 9 
7 5 7 6 
8 3 3-2 4 
>9 30 — = 
d = 955 











Example (ii). Captures of Macrolepidoptera in a light-trap at Rothamsted. (Quoted as 
example (i) in G, §8 from Williams’s data in Corbet, Fisher & Williams (1943).) N = 15609, 


d = 240. Table 5 shows the small values of r. n} is n; of the example in G, obtained by 


r 
smoothing > tn,. ny is Fisher’s analytic smoothing, given by H, of G with parameter 
t=1 


f = 40-2. Now H, is a hypothesis defining the distribution of the population frequencies 
{p,}, and it implies that 





Bi NA \r 
N 
(G, (63)), and &(d(A)) = flog, (f+ 1) (50) 
(G, (67)). Since N > #, we see that H, implies 
&(n,(A))=E(n,) (51) 
and & (d(A))=d + flog, A. (52) 


Putting A = 2 we see that doubling the sample will approximately halve the proportion of 
the population not represented (by (51) and (3)) and increase the number of distinct species 
observed by approximately flog, 2 = 27-9. (The latter fact was noted by Williams in 
Corbet et al. (1943), p. 51.) 
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Table 4 
Estimates of d(A) A=155 | A=2-0 A=2°5 
| 
nega eas en oe -: one ate Jota = 
Using n, 1296 + 34 1599 + 50 1872 
Using n; 1299 1613 1909 
Using nz 1296 1601 1890 
Actual results 1303 + 29 1551+ 31 1832 + 34 
Estimates of n,(A) A=155 A=2-0 A=2°5 
Using n, 958 + 42 11727 1322 
Using n; | 981 1238 1435 
Using nz 961 1168 1350 
Actual results | 983 + 29 1116+31 1308 + 32 
| 
| | Pte 
istimates of % coverage | A=1-+5 A= 2-0 A=2°5 
a — a = | — — — — 
Using n, | 68-1 + 1-4 | 70-7 73-6 
Using n; | 67:3 69-0 71:3 
Using nz | 68-0 70-8 73-0 
Actual results | 67:2+ 1-0 72-1408 73-8 + 0-6 
a | | 











+ The s.p. is not given in this case because the sum of the series was not taken to infinity but 
estimated after eight terms as lying midway between the last partial sums; that is, in effect, Hutton’s 
method (Hu, 1) was applied to sum the series (Hardy, 1949, pp. 21-2). (34) would give a very large 
variance, most of which arises from terms after the eighth. 























Table 5 ¥4 
| — | 

r Ny ni | ny 

l 35 | 35-0 401 
2 | 11 22-5 20-0 
3 15 16-3 13-3 
4 14 | 12-3 10-0 
5 10 9-7 7-9 
6 ll | 17 6-6 
7 5 6-0 5-6 
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The present example is not a good one to which to apply our distribution-free} methods, 
since even 7, is rather small; however, we shall obtain the corresponding results for com- 
parison, using the smoothed values n}. By (18), 


(2) = 2(35-0 — 45-0 + 48-9 — 49-2 + 48-5 — 46-2 + 42-0—...) 
= 2(17-5—2-5—0-8—0-2+0-0+0-140-1+...) 
(transforming the series by (#, 1)) 
= 28-4, 


whence (using (3)) we predict that the proportion of the population not covered should 
decrease from 





35-0 
= 0-220 
15609 ~ 27% 
28-4 
— 0-09 9/ 
” 31218 ~ 09 > 


whereas H, implied that it was halved. By (24), the expected number of new species is 
approximately 


35-0 — 22-5 + 16-3 — 12-34+-9-7-—7:74+6-0-... 
17-54+3:140-8+4+03+4+01+0-1+0-0+... (by (HZ, 1)) 
= 21-9, 


whereas H, implied that 27-9 were expected. Finally, in order to estimate the s.p. of n, we 
calculate from (18) 


Mo(2) = 4(22-5 — 48-9 + 73-8 — 97-0 + 115-5 — 126-0 +...) 
4(11-25 — 6-60 — 0-19 + 0-01 — 0-09 — 0-044...) (by (B, 1)) 
17-4, 


so, by (31), 
V (n,)=35—4.2.17-4 = 26-3, 
giving n, a 8.D. of about 5-1, whereas H, (by (65) of G) gave 5-5. 

We note that our distribution-free estimates of n,(2), n.(2), d(2) are all less than the values 
implied by H,. This is what one would expect when the sample size, N, is large enough for 
the finiteness of s (the total number of species in the population) to conflict with the pre- 
diction of H, that s = 00; but the effect may well be accidental. 

In this example, we can also make a rough comparison between the variances of the 
estimate of n,(A) deduced from H, and that given by our distribution-free method. The 
former estimate is almost exactly /, and its variance is therefore approximately equal to 
the sampling variance of f for this example, given by Fisher (Corbet et al. (1943), p. 56) as 
1-13 (s.D. of 1-06); this is independent of A. Our 7,(A), on the other hand, has a very large 
variance for A> 2 (by (34)), and even for A = 1, since 7,(1) = 7, its variance, as we saw 
above, is about 26-3 (s.p. 5:1). However, it must be remembered, first, that H, is certainly 
not exactly true (since in fact s < 00), } so that the estimate deduced from it is subject to an 
unknown additional error, and secondly, that we may hope to reduce the variance of 
(A) considerably by using carefully smoothed values and summation methods. 

+ I.e. independent of any particular assumption about the distribution of the p,, in contrast to the 


above argument which assumes H;3. 
{ Not even the truncated form, H, of G, can be exactly true, as was shown in G, p. 257. 
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In general, if a fairly simple hypothesis H on the p, (e.g. any of H, to H, of G) gives a good 
fit for the n,, we should prefer to deduce &(n,(A)) and &(d(A)) from H, rather than use the 
distribution-free estimates (18) and (19); but such extrapolation should be made with 
caution, and the distribution-free methods may give a useful indication of the error to 
be expected if H is false. 

Example (iii). Sample of nouns in Macaulay’s essay on Bacon. (From Yule (1944), 
Table 44, p. 163; quoted in G as example (iii), p. 260.) N = 8045, d = 2048. 

















Table 6 
| , | 
r Ny Ny r Ny | Ny 

1 990 1024 11 | 24 15:5 
2 367 341 12 19 | 13-1 
3 173 170 13 10 | 11-3 
4 112 102 14 10 9-7 
5 72 68 15 13 8-5 
6 47 49 16-20 31 30-5 
| 7 41 | 35-5 21-30 31 | 31-5 
| 8 31 | 28-5 31-50 19 25-9 
9 34 | 22-7 51-100 6 | 19-9 
10 17 18-4 101-« 1 | 20:3 

| | 





(As in the tables in G, n, and nj, have been summed where values of r are grouped.) 
Here n}.(= nj of G) is the analytic smoothing 
ni = 2048 
r ¢(r+1)° 
(H, of G; notice that this is not an explicit hypothesis on the p,, and that it gives a good 


fit only for r < 30.) (53) is so simple in form that we can carry through all our calculations 
analytically. Again we consider doubling the sample (A = 2); by (18), 





(53) 





A S ee 2048 
m.(2)=2 dX (-1l)(4+1)- ens 
(2) = 2 B(— O+D Gay 4a) 
o (_yy 
= 2.2048 5 (=) 
iso 1+2 
= 2.2048(1 —log, 2) 
= 1260; 
i = (t+ 1) (¢+2) 2048 
and ’(2)= 4 > (-1) , + on 
(2) = 4 (1) 2 (i +2) (i+ 3) 
4 +1 
= 2.2048 § (—1)' 
>| i438 


= 2.2048] ¥ (-1)i-2 5 | 
i=0 i=o 1+3 


The first series is summed by any standard method to 3, and the second is equal to log, 2—3; 
hence Nq(2) = 2. 2048(3 — 2 log, 2) 
= 465. 























toms 
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Finally, by (19), 











za ss . 2048 
d(2) = 2048 — —l)'- 
(2) 2 Ge) 
ny oe) (-—1): Co) 
co (_})é+l 
= 2048.2 S| I) 
i=1 U 
= 2048 .2log 2 


= 2840. 


Notice that, since the nj give a good fit only for r < 30, the justification for substituting 
them in infinite series rests on the following argument: 

(i) the partial sum of the series} with the true &(n,) down to the term containing &(739) 
is a good approximation to its infinite sum; 

(ii) the same is true for the series} with the n/; 

(iii) the } are good approximations to the &(n,) for r<30, so that the partial sums 
mentioned are nearly equal. 
(Compare the argument justifying evaluation of integrals by the saddle-point method.) 
(i)-(iii) should be borne in mind whenever an analytic smoothing is used in this way, or 
even when it is used to give values of n; which are treated numerically; otherwise there is 
some risk of obtaining an apparently satisfactory convergence which is in fact spurious. 
When possible, it would probably be advisable in such cases to try a graphical smoothing as 
well: the reader might like to try the smoothing nj of G, using the (Z, 1) or 2, method of 
summation. 

We have now sufficient data to derive the result which was quoted in §1. By (3), the 
proportion of the population not represented in the 2048 nouns of the basic sample is about} 


BRL= 12-7 % 
by (7) of G the proportion not represented in the 2840-1260 = 1580 nouns occurring twice 
or more in the doubled sample will be about 


1260+ 2.465 
— -= 13-6 %; 
16090 di 


by (2) of G, the average frequency of the 1260 nouns occurring once only in the doubled 


sample will be about 2.465 


—~ _ ~0-0046 
1260. 16090 f.. 


Hence, if we add a random selection of 


-6—-12-7 
13-6-12 ‘ 200 
0-0046 

of the nouns occurring once only in the doubled sample to all those occurring twice or more, 
we will have a list of about 1780 nouns covering approximately the same proportion of the 
population as the 2048 nouns of the basic sample. 

+ Or, if summation methods are used (as in calculating fi,(2) above), the sum of the transformed 
series. 

t The figure of 12-3% given in G was based on the unsmoothed values. 
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APPENDIX 


Conditions for the lemma of §1 


Although this lemma was not actually used in our argument, we give here, for any reader who may be 
interested, two fairly general sets of conditions under which it holds. 
If a,>0 for all r, and finite numbers b, are defined by (4), then (5) holds if and only if 
bps . 
= 70 as i+. (54) 
a: 
LB (=Dibae 


T? 


Proof. Writ R(n,r) = = 
roof. Write (n, 7) = Fa! ii 


so that (5) holds if and only if R(n,7) +0 as n > 00. Now, for allu>r, 


1% (-1) 





fo 6) 
R(n,r) = — —— & arta, —a, 
rlizo t! g=0 
2. fe n (s—r 
== ( jad (—0*( )-a, 
s=0 \"/ i=0 v 
= s\ /s—r—-1 
= 2 (—1)* A,— Arps 
s=0 r n 
: a QQ) 2 : " 
using the definition ») = 5, oven if a is negative, together with the well-known identity 


C4 rey ee 

: = , + : : 

a i—1 a 

Putting s = r gives a term+a,, and all other terms with s<n+7r+1 vanish; hence 


Rin,r)=(-1)" (VC a, 
s=n+r+1 \" n 


2 8s—r—nsint 
=(-1" = 


s=ntrt+1 &—7r nir! 








i. (55) 
Now, since a,>0 for all s, 





and the sufficiency of (54) follows. The necessity is trivial, since if (54) does not hold the right-hand side 
of (5) cannot converge. 

If a, = O(z"), O<a<4, then (5) holds for all r; further, (5) does not hold if a, = 2-7, so this result 
cannot be improved by extending the range of x. 

Proof. (i) We may assume without loss of generality that | a,|<2’. Then it follows from (55) above 
that . 


| R(n,r) |< 


gin” 
as 





s=n+r nir! 


(n+r)!a* 2 (—n—r—-1 . 
aaa (—3) 
nir! t=0 t 


7” 1 es + rn x n+r+1 
l-az 


riz 








+0 as n> provided x< 3; 
hence (5) holds for all *. 
(ii) Taking a, = 2-7, we have ia : Koy" 
r=0 
= 3.41, 


summing the series as in (i), and it is clear that the right-hand side of (5) is not convergent for any r. 
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A SEQUENTIAL TEST OF RANDOMNESS FOR EVENTS 
OCCURRING IN TIME OR SPACE 


By D. J. BARTHOLOMEW 
University College London and Scientific Department, National Coal Board, London 


1. INTRODUCTION 


In a variety of practical problems it is necessary to test whether a sequence of events is 
occurring at random in time or space. To the casual observer even a random series will 
suggest fluctuations in the density of events, so that an objective test is clearly required to 
establish randomness or otherwise. 

The problem was discussed by Maguire, Pearson & Wynn (1952) as applied to industrial 
accidents; they pointed out that the basic data in such cases consisted of an ordered sequence 
of intervals between events. Recent investigations into the possible departures from 
randomness have shown that it is sometimes appropriate to use a test based on intervals, 
but on other occasions the times form the best starting point. In this paper we give a 
sequential test which utilizes the times at which events occur because the alternative 
envisaged requires this approach. A sequential test is of special value in this case because 
the observations become available one at a time; provided that they do not follow each 
other too rapidly it should be possible to carry out a test as the process develops. 


2. THE ALTERNATIVES TO RANDOMNESS 


In order to derive a sequential test we must first of all define the class of alternatives to 
which the test must be sensitive; this choice, of course, will be governed by the problem in 
view. Where the events are accidents or machine failures, for example, it will be most 
important to detect a systematic increase or decrease in the rate at which events are 
occurring; minor fluctuations are relatively unimportant. The test to be put forward is 
designed to be sensitive to the presence of this type of alternative. 

We shall consider the class of processes for which 


Pr {event in (7,7+dT)} =A(T)dT+o0(dT) (A(T)>0 for T'>0), 


where 7’ is time (or distance) measured from the point at which observation was begun and 
A(T’) is a monotonic function of 7’. The case, A(7’) = constant, yields the familiar Poisson 
process; this will be the null hypothesis corresponding to randomness denoted by H). It 
will be convenient to refer to A(7') as the rate at which events are occurring at time 7’. The 
problem thus becomes that of testing the hypothesis, A(7') = constant, against the alternative, 
A(T’) increasing or decreasing. As an example of a situation where this sort of alternative 
would be suitable, the data given by Maguire et al. (1952, Table 1), concerning mining 
accidents, may be mentioned. Accident and failure data, already referred to, furnish many 
examples of this type. 

The raw material for the test consists of the times at which events are observed to occur, 
T, T, ..., T,,, and the first step must be to find their joint distribution. The distribution of 








Sutin 





Sis 
will 
1 to 


rial 
nce 
om 
als, 
ea 
ive 
use 
ach 


3 to 
1in 
ost 
are 
1 is 


und 
son 

It 
he 
ve, 
ive 
ing 
ny 


ur, 


. of 





D. J. BARTHOLOMEW 65 


T; clearly depends only on that of 7;_,, so that p(7,, 7, ..., T,,.) may be obtained from the 
relation n 
PT, Ts, .--,T,) > II p(Z; | T;-1), T= 0. 


i=1 
Ti 
Now w(T,| 7,4) = ATiyexp| - | arya, 
Ti-1 
therefore 
n Tn 
PD Be ey) = TACT exp | ~ |“ acrya. (1) 


In order that this can represent a joint-probability density function a further condition 
must be imposed on A(7'), namely, 


lim | A(T) dT = 0. 

ra, 0 
This ensures that A(7') does not decay too rapidly; it is always satisfied if A(7’) is monotonic 
increasing. 

Before the Wald test can be derived A(7') must be completely specified; it will need to 
involve a scale factor as well as some parameter which determines the degree of departure 
from randomness. Without going farther than requiring that A(7’) shall be monotonic 
decreasing or increasing a wide choice exists among the possible functional forms. It 
therefore seems reasonable to investigate the consequences of choosing that form which 
leads to the most straightforward mathematical treatment. For mathematical convenience 
therefore we choose 

AT) = w(uT)* (a> —1). 

This is monotonic decreasing for —1<a<0 and increasing for 0<a<o; the problem 
finally becomes that of testing the hypothesis a = 0 against either or both the alternatives 
a>0ora<0. wis a nuisance parameter which will be eliminated from the problem. 

The important question which now arises concerns how far the test derived on this 
assumption will be efficient when the true alternative takes some other form. For example, 
it will be noted that the form proposed implies that when 7 = 0, A(7') = 0 if a>0 and 
A(T’) = wifa < 0. Thus taken rigorously the origin chosen for 7 is of considerable importance. 
Although therefore the test based on the alternative A(7’) = (u7')* will not always be the 
best possible, there arc good reasons for believing that it will nevertheless be a good one 
under a variety of conditions. This matter will be discussed below in connexion with the 
practical application of the test. It may be noted that a detailed investigation carried out 
in fixed sample theory (Bartholomew, 1955) shows that the statistic Q, = — 22 log 7; for this 
alternative form of A(7') has a fairly high efficiency under a wide range of other alternatives. 
It seems reasonable, therefore, to assume that the sequential version of the test will have 
this property. 

For the joint density function of the 7’s under the chosen alternative, we find 


pon Fe 


P(Ty, Tay Ty) = 1" TL (wT) exp [—(wT,)**)(a + 1))- (2) 


1 


It will often be desirable to have a test which can be applied when the first few observations 

are missing or have been used to reach earlier decisions. For this reason it will be more 

convenient to work with the more general density function of 7, 7,1, ---, T, (k= 1) which 
5 Biom. 43 








id 
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may be obtained by integrating out 7), ..., 7,_, over the region 0<7,<7,<...<T7,_,< T,. 
We find Op 
T ur—-k+1 (yT\e-Diat) an 

pl bs Meas --+9 Bq) ( k) 


(e-1 +ayea Lh expl— (eT) (a+ 11. (3) 





3. THE SEQUENTIAL Q-TEST 


The parameter y is of no interest and can be eliminated from the problem by carrying out 
a sequential transformation of the observations. Any transformation will lead to the same 
test provided it satisfies two conditions: 

(i) It must be possible to form the new variables in a sequential manner. 

(ii) The transformed variables must be independent of y. 

Probably the simplest transformation satisfying the required conditions is as follows: 


_ Th TET PET ---T, 


The test is based on the new variables v;,, v;,1, .--, Y¥n_;. The Jacobian of the transformation 


is of the form V"-!/u"-*+14,(v,, ..., U,_1), Where ¢, is a function of the v’s not depending on a. 
We thus obtain the joint ge Of Vp, Vests +++) Up, and V as 


(v,— 


P(Y%-» “rr? Un—-» V)= (k- 1)! ( 1 ar = Pil, +99 Up- 2 )\Po(Vp»-- +9 U_—1) V™tD—1exp—[V2t (a+ 1)], 





where ¢, = (#7;,)*-1 can be expressed as a function of the v’s which again is independent of a. 
Integrating out for V from 0 to infinity, we finally obtain 


! 
PUP ns Mya) = (L4.0)"* 05 ah bade (4) 


It is now possible to form the sequential bedein. ratio test based on the (n—k) 
variables v,, V;41; ---» Yn—1- Thus 


_ Pr +++) Un_1 | @ = A) 
P(Vys +++2 Un_y | @ = 0) 





_ (1 — Ay)"-* Un'-15 


or taking logarithms and reverting to the original observations 
log R ad (n = k) log (1 +4) — a) Q(n, k), 
n-1 
seaiate Q(n,k) = —logv,_, = (n—1) log T, — (k—1) log J, my log 7;. 
i=k 
In the special case k = 1 we notice that Q(n, 1) = — > log z , @ form which brings out 


the relation with the fixed sample Q-test for seating "iaee points are distributed at 
random on a line, already mentioned in § 2. We shall return to this point later. 
An important property of the test which will be useful in applications is the relation 


Q(n, k) = Q(n, 1)—Q(k, 1). 











ution 
ona. 


F 1)], 


ofa. 
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To carry out the test we adopt the rule: 
Accept A, if log R < log B. 
Reject H, if log R>log A. 
Continue sampling until either of these inequalities is satisfied. 
A and B are the usual constants related to the two kinds of error, « and /, by the equations 


A+(1-A)[a, B+£|(1-a). 


In practice it is more convenient to rearrange the inequalities so that sampling is continued 
as long as 
—log —— log (1 +a) Gin, B)> - —log A +(n— : k) log (1 +a) 
0 0 





the left-hand quantity being the acceptance boundary and the right-hand the rejection 
boundary. By the method of Cox (1952) it is easy to show that the test terminates with 
probability one. 

It sometimes happens that the sample point (n, Q(n, k)) falls near the upper test boundary 
and is followed by a long delay before the occurrence of the (n+ 1)th event. The question 
then arises as to whether a decision can be reached before this event occurs. If in the formula 
for Q(n + 1,k) we replace 7, ,, by the time at which a decision isrequired, say 7” (T,, < T” < T,,,1) 
and denote the value of Q so obtained by Q’(n + 1, k) we have 


Q(n+1,k)>Q'(n+1,k). 


If the point (n+ 1,Q’(n+1,k)) falls outside the boundary so will (n+1,Q(n+1,k)), so 
a decision can be obtained more quickly in this way. No such rule can be used at the lower 
boundary, nor is one necessary, because a long delay will always move the sample point 
towards the upper boundary when there is no question of the lower boundary being crossed. 


4. PROPERTIES OF THE TEST 
(i) The operating characteristic (0.c.) 
The 0.c. function, which is the probability of accepting the null hypothesis when a is the 


true value, is given by ‘Ala 1 


L(4) = Fina — Bua’ 


where h(a) is the non-zero root of the equation 


Uj, --- Un_y | @ = Ay) |* 
| [eee ae ACen (Vz; -++5 Una | @) dv, ... dV__, = 1, 


and S is the region in which the sample point must lie. 
Substituting known quantities in this equation, it becomes 


(1+ ay)e-P*] okt P( Ups ++) Una | @) dv, ... dv,_, = 1, 


or (1+ay)—* (vi, | a) = 1. 


Now 4 BEF. Balt s. 
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Putting y; = 7,/T,, (¢ = k,...,.n—1), Y = yT,, in the joint distvibution of the T’s, we obtain 


n—1)! — 
P(Yrs Yoo +++ Yn—k) = (1 +a ation a IL Yi: 
! ix 


(k 
Hence it is found that 


P ha —(n—k) 
S (okies |a)= (14 4%) 
so that the equation for h becomes 
ha,, \"-* 
(: + re.) = (l+a,)-», (5) 


which is independent of n and k. Corresponding to any solution of equation (5) with h, a, 
and a there will correspond a second solution with —h, b, and 6 where 


by = —ay/(1 +a), 6 = (a—ap)/(1 +a). (6) 


Table 1. The value of L(a) for various a 




















a | “2 | 0 a’ Ay | re) 
me | lis gid 1 
4 | 
L (a) | \ log A 
0 an we. | 
(ay<0) | J - log A —log B ee Pls 
L (a) \ log A | 
bid i- ces bo S.A) ing 
(a) > 0) J ” log A—log B / | 








Then it is easy to show that L(b) = 1—L(a), with « and / interchanged. As A tends to zero, 
L(a) approaches the value log A/(log A —log B) or } if a = £. Expanding the right-hand 
side of equation (5) we find that 


1+hlog(1+a,))+O(h?) = 1+ha,/(1+a), 


which has a root h = 0 if a = a,/log(1+a))—1. Denoting this value of a by a’, we have on 
expansion : 
, a! = 34 — qx + O(as). 

The five points on the 0.c. curve given in Table | are sufficient for many purposes when only 
a rough indication of its shape is required. A more detailed solution can be obtained by 
solving (5) for other values of h. As a first approximation h ~ 1 — 2a/a), more accurate values 
may be obtained by an iterative procedure or more simply from the intersection of the two 
graphs y=1+haj/(l+a) and y=exp[hlog,(1+4a,)]. 

Charts could easily be constructed for this purpose if required. As an indication of the extent 
to which the probability of picking out a departure from randomness falls off when the true 
value of ais rather less than ap, the results, all calculated for a = 0-8a,, given in Table 2 are 
perhaps instructive. 


(ii) The average sample number (4.s.N.) 
Let us define 
; (UY; | Vs—as +++) Ups @ = Ap) 


=i : 
ae P(Y; | Vy_a) «++, U3 @ = O) 
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which in our case becomes 
2; = log (1 + aq) + ag(log v; — log v;_,). 
Wald (1947) showed that, provided the expected value of z; does not depend on i, the 


A.S.N. was given by L(a)logB+(1—L(a))le“A 
E,(n | dy) = agt  Neaak 


where &,(n | ay) and &,(z;) are the expected values of n and z; when a = ay is used in the test 
and the true value is a. 


Table 2. Values of (1—L(a)) when a = 0-8a, 











| | Value of 1— Z(a) when 
| | 
Ay a } h = 
a=2=005 | a=f=0-01 
: if ent Beer ak Ervin \_ J 
| | 
—0°5 —0-4 — 0-516 | 0-820 0-915 
—2/7 — 1-6/7 | — 0-560 0-839 0-929 
0-4 | 0-32 — 0-633 | 0-866 0-948 
1-0 | 0-8 | — 0-644 | 0-876 0-955 








By the transformation of the previous section it is easily shown that &(—logv,;) = i/(a+ 1) 
and hence that ; ; 
: E,(%) = log (1 +49) —a/(1 +4), 
which is independent of i. At the point a = a’ the above expression for the average sample 
number becomes indeterminate; in this case Wald showed that 


E,(n| dy) =~ —log A log B/E,(z). 


He also remarked (1947) that the maximum valueof &,(n | a9) usually occurs at or near this 
point. By a method similar to that used above, or directly from a theorem of Epstein & 
Sobel (1954), it may be shown that 


6 (z}) = [log (1 +a), 


= ee E,(n | ay) ~ —log A log B/[log (1 +49) ]?. 


Numerical values for &,(n | a) are given in Tables 3 and 4. Table 3 shows how the 4.3.N. 
varies with the choice of «, #, a) and the true value a. If the table is used with negative values 
of a, and a, then it may easily be shown that 


y(n | bo) = E(n | Ao) 


with « and # interchanged and the relation between bo, b and do, a as given in equation (6). 
The row a = 1 is of special interest because it represents a linear increase in A(7'). Corre- 
sponding to this we have b = 1+ 2b9, but this no longer has the same physical significance. 
The values a = —1 and a = © represent the most extreme departures from randomness 
that are obtainable with the chosen form of alternative. 

Table 4 gives a more extensive tabulation of &(n | a9) and &,,(n | a») to assist in setting 
up a test. In the same way as for Table 3 it may be used for negative dy. 
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Table 3. Approximate values of &,(n | a9) for various a, 8 and ay 


‘7 Ay | 0-2 0-4 0-6 0-8 1-0 


a | 
| @ 0-01 0-05 | 0-01 0-05 | 0-01 0-05 | 0-01 0-05 | 0-01 0-05 
B | 




















| 0-01 | 0-01 
| —] or | or -1 
0-05 0 0 0 0 0 0 0 0 0 0 0-05 
| a ¢ o- - | 
0-01 54 54 23 23 14 14 10 10 7 7 0-01 . 
5 ae Che | | -(3b -1 
C25 lo0;| 3 3 | 15 15 | 9 9 | 6 6| 5 5 | oops | H%—2) 
o | O01 | 256 237 | 72 67 | 36 33 | 22 21 | 16 15 | OO | 4 
0-05 | 166 «151 47 43 | 23 21 15 14 11 10 0-05 | : 
| 
? | 0-01 635 409 | 187 120 | 96 62 61 41 44 28 0-01 b’ 
* | 0-05 | 409 261 | 120 77 | 62 39 41 25 28 18 0-05 
| | 
0-01 | 289 87 90 58 | 48 32 32 21 24 1 0-01 0 
0 0-05 | 268 170 83 53 45 29 30 19 23 15 0-05 
tae we. ae ee ee Se! 
95 | 0.05 | 93 60 64 40 tag! MET: —— 0-05 (1 + 3bo) 
| | | 
19 | OO | 55 36 34 si 18 | 24 16 24 16 0-01 | iio 
0:05 | 55 36 33 21 | 26 17 23 15 23 15 0-05 | ° 
j | | 
0-01 | 25 16 14 9 | 10 6 8 5 7 4 | 0-01 | Pm 
° |005; 2 16] 14 #+«99 | 0 6 8 5 7 2064 | (005 | 
en hae 8 | ae See, -| ee 
| 0-01 0-05 | 0-01 0:05 | 0-01 0:05 | 0-01 0:05 | 0-01 0-05 nh b 
{ 








| 
| 

2 3 4 
| 9 








ie 
a 
o 





For a,>0, use the top and left-hand marginal scale; for negative a, written as by, use the bottom 
and right-hand marginal scale. 


(iii) Saving in sample size effected by the sequential test 


As an alternative to the application of the sequential test just described a succession of 
fixed sample tests could be used. In this case we should decide in advance to observe a fixed 
number of events and then to carry out a test for a trend. At the occurrence of the nth event 
(n — 1) observations 7’, 7), ..., T,,_; would have been made in the interval (0, 7',). It is easily 
verified from equation (2) that the variables defined by y; = 7;/T,, (i = 1, 2,...,n—1) have 
the same joint distribution as a sample of (x — 1) from the population 


Ply) = (a+1)y® (O<y<}l). (7) 
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Table 4. The average sample number for the sequential Q-test 
=i The upper figure is &,,(n|a@ ) and the lower &,(n | do). 
l 
a| 001 | 0-025 | 005 | 0-01 0-025 | 0-05 0-01 | 0-025 | 
B| 001 | 0-01 0-01 0-025 | 0-025 | 0-025 | 0-05 | 0-05 | 
, 
ay | 
oes 366 293 237 356 279 229 340 269 
329 320 305 263 | 251 242 213 205 
—1 
Popes 289 231 187 281 220 180 268 212 | 
256 248 237 204 | 195 188 166 160 | 
by—1 | | 
“— 0-295 235 188 152 228 179 147 218 173, | 
205 | 199 190 164 156 151 133 128 | 
b | | 
7 0-25 196 156 127 | 190 149 122 182 144 | 
169 164 157 | 135 129 124 109 105 | 
| 
b’ | 
ie 144 115 93 | 138 | 110 90 132 106 
121 116 112 97 | 92 89 78 15 
0 
036 111 89 72 | 108 | 85 70 103 82 
91 89 8 73. |; 70 67 59 57 
| 
+) 0-40 90 72 58 87 | 69 56 83 66 
72 70 67 | 58 | 55 53 47 45 
} 
+ a ou 15 60 a9 | 72 57 47 69 55 
58 57 54 | 47 45 43 38 | 37 
. a 63 51 41 2s| 49 40 59 47 
49 47 45 | 39 37 36 32 30 
minnie | 
b on 48 39 32 47 37 31 45 36 
36 35 33 29 27 26 23 | 23 
ae 39 31 26 | 38 30 25 36 29 
sad 28 27 26 22 21 21 18 18 
wes — 32 26 21 32 25 21 30 | 24 
22 22 21 18 17 17 15 14 
tom | 
) o-00 28 22 18 27 21 18 26 | 21 | 
18 18 17 | 15 14 14 12 | 12 | 
| | 
| | 
| ii 24 20 16 | 24 19 16 23 | 18 
16 15 a 12 12 11 10 
1 of | 
ced 
ent For negative values of the parameter, take by = — aq/(1 +4) and interchange a and f so that we have 
ily E g(r | Bo) = Fa(m| Aa)» Fo4(t| By) = Solr | Qo). 
uve 


(7) 
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Pearson (1938) showed that the test based on the statistic 


n— 


1 —1 
Q, =—-2 ¥ logy; = _2'5 log T;/T,, = 2Q(n, 1) 
i=1 i=1 


was the uniformly most powerful test of the hypothesis a = 0 with respect to the class of 
alternatives defined by (7) with a>0. By making the transformation u = —2logy in (7) 
we find that p(u) = (a+ 1) eat, 


from which it follows that Q, is distributed as 7%,,_/(1+a). A more general form of Q, 
corresponding to Q(n, k) has been given by the author in unpublished work. The following 
analysis is applicable to the more general case by writing (n —k + 1) in place of n. 

The purpose of this section is to investigate the saving in sample size achieved by adopting 
the sequential procedure; this is important because it means that decisions will be reached 
more quickly using this method. 

The number of observations, n, required for the fixed sample test of strength («, /) is 
determined by finding the degrees of freedom, v = 2(n—1), for which the distribution of x? 
has the upper 1008 % point = (1+ a ) x its lower 100x % point. If we limit discussion to 
values of a, sufficiently small (say a, < 0-5), so that v will be large enough to make use of 
Fisher’s normal approximation to the distribution of y, i.e. take ,/(2x?) to be N(,./(2v—1), 1), 
then for given values of a), « and £ we have to solve the equation 


V(4n—5) +X, = (1 +49) (V(4n—5)—X,). 
Here X, and X, are the appropriate standardized deviates of the normal curve, e.g. 
a= | , e-t* dz/,/(2m). Solving the equation for n, we find 
i n= 1+}{1+(X,(1+a,)h+ X,)?/(1-(1 + do)*)?}. 


Table 5. Average percentage saving in sample size if sequential test used 











Limit as 
Ay 0-5 0-3 a +0 
— ae ee ee =? ees Se a SS A sae ss eee 
" | 
0-05 0-01 0-05 0-01 0-05 0-01 
B 
et Se, ie ox 2 a 
0-05 68 54 5 | 51 63 | soe ibs 
0-01 52 63 50 62 7 58 | 0 
a=0 
| 
0-05 43 41 46 43 51 47 ye 
0-01 | 56 53 59 54 63 58 1 
| | | | a=M 











Table 5 gives the average saving in sample size, using the sequential test, for various a) 
expressed as a percentage. This is applicable for negative a, by taking by) = —a,/(1+4p), 
interchanging both a and /, and ‘H, true’ and ‘H, true’. 

The saving in sample size is never less than 40 % over the range of a), a and / that are 
likely to be encountered in practice, in fact it is often considerably more than this. 
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5. THE TWO-SIDED TEST 


Where interest centres on the detection of both increases and decreases, a two-sided version 
of the test is required. 

In this case we test the hypothesis H,(a = 0) against the alternative H,(a = a,) or 
H,(a = —a,); because of the symmetry of the two cases a = a, and a = —a,/(1+4p), it may 
be convenient to take a, = —a,/(1+4a,) = bo, although this is not necessary, of course. 

Two methods have been put forward to deal with this situation. The first uses a likelihood 
ratio formed by taking the simple average of the ratios required for the separate tests, and 
the second consists of running two one-sided tests simultaneously. The latter method will 
be preferred because it requires no additional calculation; it is not difficult to see the relation 
between the two methods. 

Details of how the test is carried out are given in Armitage (1947); the procedure becomes 
ambiguous if it is possible to accept both H, and H,, but this is only possible for unusual 
values of a), « and /, and even then the probability of such a happening is infinitesimal. 

The o.c. function and A.s.Nn. function have not been obtained for the two-sided test, but 
in practice our knowledge for the one-sided scheme will be adequate for setting up a test 
procedure. 


6. APPLICATION OF THE TEST 


As usual in setting up a sequential test we are faced with the initial problem of deciding on 
the most appropriate specification of parameter values for the alternative hypotheses H, 
and H, (and H, also if the two-sided test is used). In the present situation H) is clearly the 
hypothesis of randomness with a = 0, but as is frequently the case in other types of problem 
there is no clear-cut simple alternative H,. The most appropriate alternative to introduce 
has to be settled paying regard to the o.c. curve and the A.s.N. associated with different 
values of the parameter and a and ff. 

There is also further difficulty in the present case. The practical significance of changes 
in the parameter a is less easy to grasp than that of changes in, say, the mean of a normal 
distribution or the probability p of a binomial. To help in the physical interpretation of a, 
Table 6 has been prepared giving, for different values of a, the number of events which would 
occur, on the average, in successive equal periods of time starting from 7’ = 0. These results 
have been standardized by making the expectation in the first period equal to unity. 
Although these can be obtained from the figures for the first six periods, we have also given 
at the bottom of the table the expected cumulative totals after 5,10, ..., 25,30 periods. 

The values in the table are based on the relation 


N, sn ty a+1 
Ny (:) 
where N, = expected number occurring in (0,¢,) and N, = expected number occurring in 
(0, tg), to > ty. 
It will be seen that broadly speaking the A(7’) law corresponds to a situation where there 
is a sharp initial change in the expectation with a gradual slowing off. Thus for a = 0-4, the 
expectation in the 3rd period is double that in the 1st, but this is not doubled again until 


+ Were it readily calculable, information regarding the distribution of the decisive sample number 
as well as its average value would also be valuable. 
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the 14th period is reached. The way in which this table may be used in combination with 
Tables 3 and 4 will be illustrated in connexion with the example given below. 

The fact that the test may be applied to a sequence of observed times 7;,, T;,,,,..., where 
(k — 1) observations have occurred in the interval (0, 7;,), permits a certain amount of variety 
in its use. The most straightforward application is where the origin for T is taken at the point 
of time when observations start; here k = 1. Ifa decision is reached at 7;,_,, and we wish to 


Table 6. Expected number of events under the alternative A(T) = u(uT')* in successive 
unit periods expressed as multiples of the expectation for the first period 























] 7) cw | [aad 
} | 
a>0 | a<0 | 
} | 
Period | | Period 
| | 
| 10 0-6 0-4 03 | O2 | -3 | -¥e | -# | -8 | -4 
| | | | | et 
4 Se MCT Peat a o | | | pig gal ght SOME 
1 1-00 1:00 | 1-00 | 1-00} 1-00 | 1-00) 1-00 | 1-00 | 1-00 | 1-00 1 
2 300 =. 2-08 1-64 | 1-46 | 1-30 | 0-78 | 0-70 0-64 | 054/041, 2 
| 3 5-00 2-77 2:02 1-71 | 1:-44| 0-72) 0-63! 0-55 | 0-45 | 0-32 3 
4 7:00 | 3-39 2:30 189 | 1:54 | 0-68 | 0-58 | 0-50 | 0-39 | 0-27 4 
| 65 9-00 3-94 2-56 | 2-04) 1-62 | 0-65 | 0-54 | 0-47 | 0-35 | 0-24 5 
| | | | 
| 6 11-00 | 4:45 | 2-77 | 217] 1-69) 0-63 | 0:52 | 0-44 | 033) 021) 6 | 
: 13-00 4:92 | 2-96 | 2-28) 1-74 | 0-61 | 0-50) 0-41 | 0-31 | 0-20 , 
i 7 15-00 5-36 3-13 | 2:38) 1-80 | 0:59 0-48 0-40 | 0-30 | 0-18 jie. 
Lie @ 17-00 5°77 3-29 | 2-47 | 1:84| 058 | 0-47 | 0:39 | 0-28) 017|) 9 
10 19-00 6-18 3-45 2:55 | 1-88) 0:57 0-46 | 0-38 | 0-27 | 0-16 | 10 
| | | | | 
| 15 29-00 7-96 4-08 | 2:90 2:05 0-53 | 0-42) 0-33 | 0-23 | O13) 15 
20 39-00 9-51 4-59 | 3:17 217 | 0-51 | 0:39 | 0-31 | 0-21 | 0-11 
25 49:00 10-91 5-04 339-227 | 0-49 037 0-29 0-19 | 0-10 | 25 
30 59-00 | 12-19 5-42 3:59 | 2-36! 0-48) 0-35 0-27 | 0-18 | 0-09 | 


1-5 =| 25-00; 1313 | 952) 810| 690| 3-82 | 3-45] 316 | 2-73 | 224| 1-5 
1-10 | 100-00 | 39-81 25-12 19-95 | 15-85 681 5-88 | 518 | 422 3-16) 1-10 
1-15 | 225-00 | 76-16 4431 33:80 | 25-78 | 955 803 6-92 | 5-43 | 387 ]| 1-15 
1-20 | 400-00 | 120-68 | 66-29 | 49-13 | 36-41 | 12-14 | 10-02 | 8-50 | 6:50 | 4-47 | 1-20 
1-25 | 625-00 172-47 | 90-60 65-66 | 47-59 | 14-62 | 11:89 9-97 | 7-48 | 5-00 | 1-25 

| 1-30 | 900-00 | 230-88 | 116-94 83-23 | 59-23 | 17-02 | 13-68 | 11-35 | 8-38 | 5-48 | 1-30 

| 


| 
| 
| Expected cumulative totals 
| 
| 








continue looking out for changes in A(7'), we may now start again with a new sequence of 
times Tf = 7), ;_,— 7, (i = 1, 2, ...). This procedure would clearly be appropriate if, for 
example, the events were machine failures and the machine was reset when an increase in 
the failure rate had been established. 

On the other hand, if a decision in favour of H, is reached after (k,— 1) observations it is 
possible to continue observation with 7’ measured from the initial origin. The test statistic 
to be compared with the boundaries given in §3 would now be 


Q(n, k) bad Q(n, 1) re Q(k,, 1). 
It is important to notice that the inference made at the second decision in this case applies 
to the whole period from the initial origin and not to the period from the last decision. As long 











od 





POO neo 


» of 
for 
>in 


t is 
stic 


lies 
ng 





D. J. BARTHOLOMEW 75 


as decisions of the same kind are reached it would seem that the process can be continued, 
using Q(n, k,) = Q(n, 1)—Q(k,, 1) after the sth decision. If, however, there is a change in 
decision from H, to H, or vice versa, justification for the further use of the original origin 
appears more doubtful. It is clear that here, as in all sequential applications, if a change in 
the probability law occurs during the course of observation, the strict theoretical basis of 
the test breaks down and its utility must to a large extent be based on empirical study. Some 
further investigation of this kind is in hand in the present case. 


7. A NUMERICAL EXAMPLE 


We have taken the data given by Maguire et al. (1952) for explosions in coal-mines in Great 
Britain since 1875 involving the loss of ten lives or more. These authors gave the intervals 
in days between successive accidents from which values of 7 with origin at 6 December 
1875 can readily be found. Fig 1 shows a plot of the observations. 




















1st decision 
° 
3 oe @ ee bed ‘ H e ‘ e = e ° 
0 1 2 3 4 5 6 7 
2nd decision 
° e 
e 2 Cm) e 9 @ @ @ @ @ 2 ao @ @ @ 
| T | T T T T 1 
7 8 9 10 11 12 13 14 
3rd decision 
% °@ ° 
2 J e a a 206 ry @ iJ 2 2 
| i | T a T T 1 
14 15 16 17 18 9 20 21 
ae 
f e 2 (mem XI a 2 t | < 2 7 2 
21 22 23 24 25 26 





Fig. 1. Time plot of mine accident data. Time scale: one unit = 1000 days. 
Origin at 6 December, 1875. 


Let us suppose that, starting from the beginning of the record, we are on the look-out for 
evidence of a reduction in accident rate. This means that a, must be taken negative; for 
clearness in reference to the tables we shall denote this value by 5). In a situation of this 
kind we should like to reach a provisional opinion as to what was happening fairly rapidly 
and there would be no object in playing for safety using very small values of « and £. Let 
us therefore take the largest values of « and / given in Tables 3 and 4, namely, « = £ = 0-05. 
Then, using Table 3, let us examine the consequences of choosing particular values 
for bo. 

(i) If we adjust the test so as to be efficient in picking out a small value of by, say — 4, we 
must expect to wait a very long time for an answer. It will be seen that &(n | by) = 170 and 
&,,(n | bo) = 151. Supposing an accident rate of 3 per annum in the year 1875, this means that 
using the test so adjusted we might expect to wait 57 years before establishing that A(7’) 
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was remaining constant. The reason for this is that a change in A(7') represented by the 
expectations in the column headed b = —} in Table 5 would be very hard to distinguish 
from a random situation because, after the initial drop in frequency (which may be obscured 
by sampling fluctuations) there is only a very gradual falling off. For example, as Table 5 
shows, with b = —} the expectation in the 30th period is 26 % below that in the 5th period 
while for b = — 3 it is 49 % less. 


Scale of n 





50 





Note: boundaries are 
- solid lines for d= 6 = 0°05 
broken lines for d= 6 =.010 


Scale of Q(n, 1) 





an 
Scale of Q(n- 23, 1) 








» 
oo Scale of n 


Fig. 2. Mine accident data; application of sequential tests with a9(=b))=1}. 








(ii) If b) = —} and the true decrease is more marked, say b = — 3, then we see from 
Table 3 that this change would be detected, on average, after 35 observations.} This is more 
satisfactory, but we note that if the test had been adjusted to be most efficient for b = — 3, 


i.e. by making b, = — 3, then on the average only 21 observations instead of 35 would have 
been needed to reach a decision. 
(iii) Keeping 6) = — 4 we should expect to detect a smaller decrease of b = b’ = — ;'; with 


probability 4, but this, on average, would take 261 observations. To sum up, with b, = — 3, 
we shall stand a good chance of detecting any decrease of importance, but we may have to 
wait a very long time to do so. 

(iv) If we go to the other extreme and take b, = — 4, Table 3 shows that a decision will be 
reached fairly quickly whatever the true value of b. A decrease with b = }(3b)—1) = —74 


+ This figure comes from the section of the table with bj = — }, b= }(3b)—1) = — 3. 
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will be reached almost as quickly as if b, had been chosen equal to — ;4; however, though 
decisions will be reached quickly if b < bo, say in the neighbourhood of — , half of them will 
be incorrect, i.e. half will be in favour of H, or randomness. 

These considerations suggest that an intermediate value of b), say — } (corresponding to 
the positive value of a, = 0-5) will be most appropriate. If the series is random we should 
reach a decision for H, on the average after 43 observations, while if the expectation is 
decreasing with a, = — 4, after 29 observations. 

If we are prepared to take greater risks of wrong decisions, e.g. make a = # = 0-10, the 
process could, of course, be speeded up. 

Fig. 2 shows the values of Q plotted against n, taking a, = — 4. This diagram should be 
studied in conjunction with Fig. 1. We start on the left with values of 


n—1 
Q(n, 1) = (n—1) log, T, — > logi0 T;, 
= 


and draw the boundary linesfora = £ = 0-05 (solid rules) at (n — 1) x 0-528 + 3-836. Adecision 
in favour of randomness is reached at the 23rd accident for which 7, = 2326 days. Had 
we used a = £ = 0:10 (boundaries, the broken lines) a decision in favour of randomness 
would have been reached at 7',. 

If it is wished to continue the sequential test beyond 7, (about the beginning of the year 
1883), two courses are now open. We may keep the origin at the original starting point and 
calculate Q(n, 24); this is effectively done by still plotting Q(n, 1) but shifting the limits as 
shown in Fig. 2, so that the new origin coincides with point Q(k,, 1). Following this plan we 
reach two successive decisions in favour of a drop in the accident rate: at T;, (8042 days), 
and T,, (16051 days). These results are shown in the upper part of Fig. 2. 

On the other hand, we may start again with a new origin at 7,, taking 7*_,, = T,, — Ths 
and plotting Q(n — 23, 1) as shown in the lower part of the diagram. In this case, for a long 
while it appears as though a decision in favour of H, will be reached, but after T;,, or the 
31st observation since the first decision, the trend is reversed. Had we used a = # = 0-10 
a decision in favour of H, would have been made at T;,, (10,024 days), but with « = # = 0-05 
it is not until 7); (16,051 days) is reached that we should come down in favour of an 
established drop in accident rate. 

It is impossible to draw any general conclusions from this simple example, particularly 
since, when a drop occurs, it cannot be described in terms of a single value of the parameter b. 
The appearance of the spots in Fig. 1 suggests the existence of several fluctuations in rate. 
We think, however, that the illustration throws some light on the working of the test. It 
also suggests points for further investigation. We may ask whether the second decision is 
reached more quickly when the origin is not changed after the first decision, 

(a) because the fall-off in A is better represented by (u7')* than by u{y(T — T),)}4, 

(b) or because the high initial density before 7, still plays its part through the value of 
k in enhancing the term —(k— 1) log 7), in the expression for Q(n, k)? 

Another point which requires further investigation is the behaviour of the test under 
alternative trends having different mathematical forms from the one on which the sequential 
Q test is based. A beginning has been made by applying the test to two artificial series 
generated by the law A(7') = 4(1+4y7'). Details of the results are given in Bartholomew 
(1955). 
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8. SUMMARY 


A method has been given for testing whether a sequence of events is occurring at random 
in time or space when the alternative is a trend. The test is derived on the assumption that 
the alternative can be represented by a function of the form A(T’) = w(uT')* for —1<a<o. 
The theory of sequential analysis is used to derive the test together with its operating 
characteristic and average sample number functions. The application of the test under 
various circumstances has been discussed from the theoretical standpoint and illustrated 
on mining accident data. 


My thanks are due to Dr N. L. Johnson, who supervised this work, and to Prof. E. 8. 
Pearson for much valuable help in preparing the work for publication; also to the Depart- 
ment of Scientific and Industrial Research for the award of a maintenance grant. 
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ON THE MOMENTS OF THE MAXIMUM OF PARTIAL SUMS OF 
A FINITE NUMBER OF INDEPENDENT NORMAL VARIATES 


By A. A. ANIS 
Chelsea Polytechnic, London 


SUMMARY. The paper is concerned with the maximum U,, of partial sums X,, X,+X., 
X,+X,+...+X, of m independent standard normal variates. 

The distribution of U,, is of interest in the theory of storage. Suppose we have a reservoir of infinite 
capacity, which receives every year a random input, from rivers, etc., whose distribution is normal 
(4, 1), and releases, for civil purposes, the mean discharge ~. The probability that, starting with an 
initial water level x, the reservoir will not run dry in the following n years is given by the distribution 
function F(x) of U,,. 

The first and second moments of the variate U,, have been previously obtained (Anis & Lloyd, 1953; 
Anis, 1955). Each of these moments was studied on its own merits, and no systematic method of attack 
was seen at that time. In the present paper, a method for obtaining all the moments is discussed. 
A recurrence relation is obtained which makes possible their numerical evaiuation. 


sees 


1, STATEMENT OF THE PROBLEM 
Consider n independent standard normal variates X,, X,,..., X,,, and their partial sums 
S,= X,+...+X, (r= 1,2,...,2). 
Let U,, = Max {S,} 
r 
denote the maximum of these partial sums. Gur problem is to obtain the moments of U,,. 
We shall always use the symbols ¢(x), ®(x) to denote respectively the frequency function 


and the distribution function of a standard normal variate. We shall also use F(x), f,,(x) to 
denote respectively the distribution function and the frequency functions of U,, i.e. 


F,(«) =Pr(U, <a), f,(0) = Fi(2). 
We have Fy) =| oft (x;) dx;, (1-1) 
where the region of integration K is defined by 
K: Ba<y (yr = 1,2, ...,%). 


2. THE MOMENTS AS A LINEAR FORM OF THE M,;(n + 1) 
It may be deduced from (1-1) that 


Fyistt) = [Fld Oe—Nd, (21) 

and hence that fnisl(e) = F,(0) d(x) + [A (a —t) dt. (2-2) 

Also it is well known that p(x —t) = (x) 5 ti H;(x)/j}, (2-3) 
j=0 


where H,(x) is the Hermite polynomial of degree j. On substituting from (2-3) into (2-2) 
we get oo Q 
fn+a(®) = 9(z) & H,(x) M;(n + 1)/j!, (2-4) 
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where M,(n +1) -| Uf, (t)dt (7+ 0)| (2:5) 
P 2 
and Mj(n+1) = 1. | 
Multiplying both sides of (2-4) by H;(x) and integrating over the range (— 00,00), we obtain 
Mn+) =| Hye) fyyale) de. (2-6) 


Using the well-known explicit forms for the H;(x), we easily obtain the following relations: 
yi(n+1) = Mn+), 


fay(n+1) = 1+M,(n +1), (2-7) 
fig(n + 1) = 3M,(n + 1) +M,(n +1), ... ete., 


—2 


9] 
where Mn+ 1) -| xf, .4(%) da. (2-8) 
Hence the problem of evaluating the moments is reduced to that of evaluating the functions 
M;,(n +1). 
3. M;(n+1) AS A LINEAR FORM IN THE F)(0) 


On applying equation (2-2) to reduce /f,,(t) to f,,_,(¢), we obtain 
Mn+) = Fy(o)] vpn des "| "ep, ew ou—H dye. 
0 0/0 
Continuing this process of reduction from f,,_, to f,,_», ...,ete., we finally get 


M,(n+1) = 3 alr, j) F,_,(0), (3-1) 


we) 


where a(r,j) = [. (| “hon —Ue) 82-9) BUp ae) Ye) TL dy, (r>2),| 
J 


i (3-2) 
a(1,j) -| Yi (Ys) dy). | 


It may be appropriate to recall at this stage an important lemma proved in a previous paper 
(Anis, 1955) and which we need in the sequel. This is 


5 F.(0)t = (1-#)-4. (3-3) 


r=0 


4, A RECURRENCE RELATION FOR THE a(r, j) 


In an earlier paper (Anis & Lloyd, 1953) the integral 
re) co) s-l 
6, = [ (s—1) i » POY) Pa — Yo) «++ PYs-2— Ys-1) Psa) I dy; = (2783)-4 = (41) 


was discussed. Here we shall show the whole results depend on this integral. We make use 
in (3-2) of the identity 


. 
a= u (r—s+ 1) (2Y5— Ysa — Ys41) (4-2) 
s= 








2-8) 


ons 


per 
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(with the convention that yy = y;, Y,., = 0); hence 


r fo fea 
ur. j)= > | nf (78 +1) (24, Ysa — Yor) Yo *P(Ya — Yo) --- PYra— Yr) B(Y,) i dy;. 
(4-3) 
The first term of this sum may be integrated by parts and has the value 
r(j—l)a(r,j—2) (j>2). 
The other terms may also be integrated by parts; the presence of the terms (24, — ¥,_4 — Ys1) 
has the effect of dividing every integral into two multiplied parts, one of the form a(r,j) for 


some r, j and the other of the form b, for some s. This leads us directly to the recurrence 


relation 
r—1 


a(r,j) = r(j—1)a(r,j—2)+ ¥ (r—s)b,_.a(s,j— 1). (4-4) 
s=1 
If j = 1, we get a(r,1) = > sb, F._ (0); (4:5) 
s=1 

the introduction of F,(0) in this last relation is due to the fact that 

0) = [foun - BUe1- Ye) BY) TL dy (4-6) 
i.e. F,(0) = a(s,0). If we define c, as 

c, = rb, = (2mr)-4, (4-7) 


then (4-4) and (4:5) may be written in the form 


r—1 
a(r,j) = r(j—1)a(r,j—2)+  ¢.4(8,j-1) (922), 
war (4:8) 
a(r,1) = SF s(0). 


s*r—s 
oat 


We now have the position that the required moments j/(n) of the original variate U,, are 
known (2-7) in terms of the functions M,(n), which are themselves known (3-1) in terms of 
the functions a(r,j); and we have a recurrence relation for the a(r,j) involving only the known 
coefficients c,. We shall not require the values of these functions a(r,j); in the next section 
we replace the recurrence relation for a(r,j) by a difference-differential equation for their 
generating functions, and show that from these we may construct a generating function for 
the M,(n). 


5. A DIFFERENCE-DIFFERENTIAL EQUATION OF THE GENERATING FUNCTION a(r, j) 


Let us define the generating function Q,(¢) as follows: 


Qt) = Lair. G20, 


| 
: i \ (5-1) 
Qolt) = Salr, oe" = YF (O)t = (1 174] 


r=0 r= 


(using the lemma of (3-3)). From (4-8) we know that 
a(1,j) = (j—1)a(1,j—2), 
a(2,j) = 2(j— 1) a(2,j —2) + a(1,9 — 1), (5-2) 


a(3,j) = 3(j— 1) a(3,j — 2) + a(1,j — 1) cg + a(2, 9 — 1) ¢,, ete. 
6 Biom. 43 
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If we multiply both sides of the first equation by t, the second by #?, and so on, and add the 
terms a(s,j— 2) vertically but those of a(s,j— 1) diagonally, we get 


Q(t) = (G— 1) tQj_o(t) + a(t) Q;alt) (J > 2), (5-3) 
where a(t) = S c,t. (5-4) 


a(t) is absolutely convergent for all values of ¢ in the range |¢t| <1. Ifj = 1, then the second 
relation of (4-8) would lead us to 

Qi(t) = a(t) Qolt). (5-5) 
We may observe with the aid of (3-1) that 


Qo(t) Q,(t) = (1—t)-4Q,(4) 


=¥ | avr, Fo} 


1 \r=1 
or Qo(t) Q;(t) = Y Mn+ 1) ¢, (5-6) 


i.e. Qo(t) Q,(t) is the generating function of M;(n +1). We have earlier reduced the question 
of evaluating the moments to that of evaluating M;(n +1). Equation (5-6) reduces this last 
question to that of obtaining Q,(t) Q,(¢). 

It may be appropriate here to work out, as examples, the explicit expressions of M,, 
M,, M3. 

The value of Q,(¢) is given by (5-5), Q,(#) and Q,(#) could be obtained directly from (5-3) 
as functions of a(t), Qo(¢) and their derivatives. By this method we get 


Qo(t) Q(t) = (1—t)-? a(t), 
Qo(t) Qa(t) = 34(1—t)-? + (1—t)-* a(t), 
Qo(t) Qa(t) = $(1 —t)-* a(t) + 24(1 —t)-? a’(t) + (1—t)* a(t). 
Equating the coefficient of t” on both sides of these equations we get 
M(n+)) = Qa) Sr. 
r=1 


M,(n +1) = n+ Qn)7's y {s(r—s + 1)}-4, 


r=1s=1 


M,(n + 1) = $n(27)- 1S rt }(2m)- 1S rb + (2m) Sy > {k(s—k+1)(r—8+ 1}. (57) 


r= r=1 s=1lk=1 


The values of M,, M, and M, were computed from these formulae. The third term of /, was 
computed using the recurrence relation 


ps(n) = Yr-*p,(n—r + 1), (5-8) 
r=1 
where p(n) = > S {a(r—s+ 1)}-4 (5-9) 
r=l1s=1 
and P3(”) = y et . ee 8+ 1) (r—s+1)}-4. (5-10) 


r=18 
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Since the values of p,(r) are available from the computation of M,, this recurrence relation 
facilitates the computation of p,(r) (and hence M;) considerably. Unfortunately, this 
procedure could not be continued much further since the formulae rapidly become 
unmanageable. 

In the following section we show how to obtain M,(n + 1) for r > 3, and hence the moments 
of U,,, from recurrence relations which depend on the values of M,, M, and M, obtained above. 

We may also mention that an explicit solution may be found for the difference-differential 
equation (5-3). Details will be published elsewhere. 


6. A RECURRENCE RELATION IN M;(n + 1) 


We have shown that Qo(t) Q;(6) = ¥ Min +1)t. 
Hence Q(t) = (1 —t)t s M,(n + 1)", (6-1) 
n=1 
and Q(t) = —4(1—t)-4 5 M,(n + 1)t"+(1—t)8 > nM,(n + 1)t”-1, (6-2) 
n=1 n=1 


On substituting from (6-1), (6-2) into (5-3), and equating the coefficient of t” on both sides 
of the equation we obtain the recurrence relation 


Mn +1) ="5 ¢, My, 4(n—r-+1)+(j—1) 2M, s(n 4+ 1)—HG—-1)S M(t). (63) 
r=1 r=1 


This recurrence relation finally gives us a practical method for the numerical computation of 
the moments. The computation of M,(n + 1) was done by using (6-3), taking for M,, M, and 
M, the values given in §5. From these it is a very small step to the first four moments of U,,. 
The following table gives for n = 2(1) 15 the mean value of U,,; the momerts about the mean, 
He; /4g, 44; and the moment ratios y, = Ms [44, Yo = /t4/v2 —3. It will be seen that as n increases, 
the distribution becomes increasingly asymmetrical and leptokurtic. 


Short table of the first four moments of y1,7>2 











| | | | 
n | Mh | Ms Ms | Ms Vs | Ye 

| DFW See alae St | | 

| 
2 | 03989 1-3408 03265 56733 | 0-2103 | 0-1556 
3 0-6810 16953 | 0-788] 9-3981 0-3570 | 0-2698 
4 0-9114 2-0536 1-3540 14-1469 0-4601 | 0-3544 
5 | 11108 =| 2-416 2-0075 19-9160 0-5354 0-4186 
6 | 1-2893 2-745 2-7385 26-7048 05926 0-4691 
7 | 1-452] 3-1360 3-5392 34-5131 0-6373 0-5094 
8 | 1-6029 3-4978 4-4042 43-3413 0-6732 | 05425 
9 1-7440 3-8599 5-3289 53-1895 07027 | 0-5700 
10 1-8769 4-2222 63098 | 64-0580 07273 0-5932 
112-0031 45847 | 17-3438 75-9468 | 0-7481 0-6132 
12 21234 4-9473 8-4282 88-8562 0-7660 0-6304 
13 2-2385 | 53100 9-5609 102-7862 0-7814 0-6455 
14 2-3492 5-6727 10-7400 117-7369 0-7949 0-6588 





15 | = 2-4558 6-03551 | 11-9634 133-7086 | 0-8068 0-6705 





6-2 
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7. THE LIMITING VALUES OF 7, Y2 


From the linear relations between the M;’s and the moments of U, about the origin, it is 
easy to obtain the following relations: 


fs = M,—3M, M, + 2M}, 1 (7-1) 
ft, = M,—4M,.M, + 6M? M, — 3M} + 6M, — 6M? +3. | 
Hence y, = (M,—3M,M, + 2M3)/(M,— M3+1)}, a 
iy. 
Yo = (M,— 4M, M, + 6M? M,— 3M} + 6M, — 6M? + 3)/(M,—M?+1)?-3. 
Now it may be proved that 
lim nM; = aur (/**) / (4). (7-3) 


This may be done by induction, using the recurrence relation (6-3) of the M,;’s, and sub- 
stituting for M;_,, M;_, their asymptotic values obtained from (7-3) for sufficiently large n. 
The summation sign in the recurrence relation would be replaced by an integral sign in the 
course of the proof. Since (7-3) is true for j = 1,2 (Anis & Lloyd, 1953; Anis, 1955) it is in 
general true. 

Computing these limits for j = 3, 4 from (7-3) we may proceed easily to obtain the limiting 
values of y,, y, when n tends to infinity. These are 


2/4 2\3 
me i — =)" ~ 0-992 
" dale iN/( 7) i 
8 / 3 2\3 . 
v= 2(1-2) (1-2) ~ 0-875. 


I am grateful to Prof. E. 8. Pearson for drawing my attention to the question of the 
limiting values of y, and y,. I am also indebted to Mr 8. Michaelson (Imperial College) for 
his part in the computation. 
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ON THE APPLICATION TO STATISTICS OF AN ELEMENTARY 
THEOREM IN PROBABILITY 


By H. A. DAVID 


Commonwealth Scientific and Industrial Research Organization, Sydney, 
Australia, and University of Melbourne 


1, INTRODUCTION 
Let P,;___,denote the joint probability of r (<n) events A;, A;, ..., A,; and S, the sum of the 


(") P’s with r different subscripts. Then the probability p,,,, of the realization of at least 


m events out of n is given by (see, for example, Feller, 1950, p. 74) 


\ 


pata | BE (1) 

Weshall be concerned with various statistical applications of this theorem. Let y,, yo, ...,Y,, 
be a set of random variables not necessarily independent or identically distributed, and 
Ya)> Yo)s «++» Yn) the same variates arranged in descending order of magnitude. We may now 
write P;;_,(¥) for the joint probability of the r events y;> Y,y;>Y,....y,> Y. Corre- 
spondingly, p,,,,(Y) wil! be the probability that y,,, exceeds Y. In the special case when the 
joint distribution of the y’s is a symmetric function of the y’s we have 


n—m 
Pm,n = p> (—1}( 
1=0 


and (1) takes the form 


= n \(m+l—1 fas . 
Pant?) = Pr (Yon) > Y) = ( ce 1)! ae m—1 )Pra..omsol ). (2) 
In particular, for the important case m = 1 we have simply 
n n - " 
Pr(yy> ¥) = 3(—1(7) Po... (3) 


Equation (3) has in fact been used by Cochran (1941) to determine the distribution of the 
ratio of the largest to the sum of a group of sample variances all following a y law with the 
same number of degrees of freedom. However, (3) and its generalizations are of much wider 
applicability as many important statistics are expressible as maxima. Examples are the 
maximum F-ratio (the largest of a set of F-ratios with a common denominator); the range, 
the extreme deviate from the sample mean, and their studentized forms. The method is 
particularly convenient for the determination of upper percentage points, on the assumption 
of a normal parent population.* In a number of cases it permits also the evaluation of 
a power function, but the interpretation of this function needs a little care in the present 
context. 

It may be mentioned here that since this paper was first submitted for publication an 
approach very close to ours has been employed by Halperin, Greenhouse, Cornfield & 


* Similar remarks apply, of course, to lower percentage points of statistics expressible as minima 
(cf. Hartley, 1938). 


oO 











86 Application to statistics of an elementary theorem in probability 


Zalokar (1955) to obtain upper percentage points for the ‘studentized maximum absolute 
deviate in normal samples’. Some of the criteria listed above will now be considered in 
more detail. 

2. THE EXTREME DEVIATE FROM THE SAMPLE MEAN 


Let x, (t = 1, ...,”) be m independent normal variates with unit standard deviation. Applying 
equation (3) to y, = 7,—% we have 


Pr (2nax.—%> Y) = ({)Pre-z> v)—(%) Pr (z—z> Y,%,—%>Y)+... 


= 1 ¥)—T,(Y)+...+(—1)""'2,(¥) (say). (4) 


Clearly, for reasonably large values of Y a first approximation to the left-hand side is 


provided by 7), with 7,,7's,... as successive correction terms.* This first approximation, 


namely Pa e~t 


Pr (%nax.—-%> Y)+ nf . 
(Tmax. ) Yin/(n—1)}2 V V(2n) 


was suggested by McKay (1935), but its remarkable accuracy for the determination of the 
usual range of upper percentage points was established only when the exact values were 
tabulated for n < 25 (Nair, 1948b; Grubbs, 1950). The position is easily appreciated from (4) 
in which 7, may be expected to be small for large Y since 2,—%, x,—% are negatively 
correlated. 7, can, of course, be evaluated from tables of the volume under the normal 
bivariate surface. However, without doing this it is possible to obtain a simple gauge for the 
accuracy of Y,(«), the first term approximation to the upper 100«% point Y, of %,4x,—%- 
By Bonferroni’s inequalities (Feller, 1950) 


T, — T, < Pr (%max.-%> Y)<T;. 
But for Y = Y, we have . = - 
and < 4n(n—1)[Pr(x,-—%> Y,)}, 
so that a— pe —1)a2/n< Pr (tax —% >) <% 
We have therefore that for all n and « = 0-05, 0-01 
a ite. eet os 
0-00995 0-01. 


A second approximation Y,(«), which wnderestimates the true value, is given by solving for 
_nem T,(¥,) = a+ T,(¥,). 


This is not very convenient, but we may replace T,(Y,) by }(n— 1) a2/n to obtain a simple 
and generally very accurate second approximation. Y 
2-1. The power function of the extreme deviate test 
In using 2,,,x.—% to decide whether an outlying observation should be rejected one 
generally has in mind the following bg 
Null-hypothesis H,: The x,(t = 1,...,”) are all drawn from a common normal population 
with unknown mean and known variance which may be taken as unity. 


* If (4) is applied to the ratio of xmax,.—% to the standard deviation of the same sample of n, then 
since this ratio is bounded, 7’, will give the left-hand side exactly for sufficiently large values of Y (cf. 
Pearson & Chandra Sekar, 1936). 
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Alternative hypothesis H,: One or possibly a few of the x, come from a normal population 
with increased mean ~+A and variance unity, the remaining x, belonging to the above 
‘null’ population. There could also conceivably be more than one non-null population. 

It is possible to apply the test sequentially, that is, having found that the largest observa- 
tion should be rejected one may proceed to test the second largest, and so on. Such tests 
have recently been discussed by Hartley (1955),* but we shall confine ourselves to the 
situation when a single test is carried out. If we take the event A, to be that x,—%> Y,, then 
equation (1) will give for m = 1 the probability of establishing significance. This will now 
be considered in detail when H, specifies only one of the 2, say 7, as a true outlier. We have 


P(A) = Fe (@max.—-%>Y, | ,) 
= (n—1)Pr(u,> Y3)+ Pr(u,> Y,-—A’)—(n—1) Pr(u,> ¥3,-A’',ug>Yy)-..-, (6) 
where Y, =[n/(n—1)]}*Y,, 
dN’ = [n|(n—1)}A, 
and u,, WU, are unit normal variates with correlation p = —1/(n—1). From the discussion in 
the previous section it will be clear that the terms not shown in (6) are negligible if P(A) is 
required only to moderate accuracy. For « = 0-05 Table 1 shows P(A) for selected values of 


n < 25 and for A = 1, 2,3, 4. The last term of (6) was calculated with the aid of tables given 
by K. Pearson (1931) and Nicholson (1943). 


Table 1. Probability P(A) of rejecting the largest of n observations by the use of %y9x,—% at the 
5 % level of significance when all observations are normal with unit variance, n—1 have 
mean pw and one has mean w+A 





























A 
és 1 2 3 4 
3 0-216 0-654 0-951 0-999 
4 0-175 0:557 0-902 0-993 
5 0-153 0-496 0-862 0-987 
6 0-138 0-453 0-829 0-980 
7 0-127 0-420 0-802 0-973 
8 0-120 0-395 0-778 0-966 
9 0-113 0-374 0-758 0-960 
10 0-108 0°357 | 0-740 0-954 
12 0-101 0-329 | 0-710 0-943 
15 0-093 0-300 | 0-674 0-929 
20 0-085 0-266 | 0-630 0-909 
25 0-080 0-244 | 0-598 0-894 
| 





We turn now to the interpretation of P(A). It is a power function as it gives the probability 
of rejecting H, when H, is in fact true, and since P(0) = «. P(A) is therefore of interest as an 
indication of the power performance to be expected from the test when the experimenter is 
merely concerned with receiving a warning about the ‘pollution’ of his data by at least one 


* See also end of §3-1. 
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rogue observation without wishing to identify this rogue. On the other hand, it is not equal 
to the probability of rightly inferring that there is a rogue observation and identifying it 
correctly, this being the joint probability that the largest observation is a true outlier and 
that it exceeds Y,. However, as A 0 this joint probability tends to approximately «/n and 
is therefore not a power function, although possessing desirable properties. It is, moreover, 
difficult to evaluate but will always differ from P(A) by less than «, approaching P(A) (or 
Pr (u, > Y3,—A’)) as A increases. 


3. THE MAXIMUM F-RATIO 
Let s?/s? (¢ = 1... n) denote the ratio of two sample variances, which follows an F-distribution 
with y,, v degrees of freedom. Then the maximum F’-ratio is s?,,, /s?. The importance of this 
statistic in overcoming the effect of ‘selecting’ the largest among a number of F-ratios is 
well known (see, for example, Pearson & Hartley, 1954, p. 39). The cases v, = 1, », = 2 
have been investigated in detail by Nair (1948a@) and Finney (1941), respectively, but by 
different methods from ours. Finney also indicates the solution for other even vy, We shall 
apply equ.tion (3) to the problem. 
Take Y, = 83/8*. 


3? 3 : 
Then Py..x(¥) = Pr(3h> ¥,2> Ys. >Y) 


-| Pr (x? > 2Y, 2, x7 > 2Y;2, ...,X%> 2Y;,2) p(x) dx, 
0 


where 47 is distributed as y? with v, degrees of freedom, 

2Y, = v,¥ |v, 
and p(x) = 2-#T-1(4v) atte”) (x > 0). 
Writing Q(z) = Pr (x? >) we have 


P,;....(¥) = _ J Q2%.2) r(e) dx 


wo $y,-1 

=| Tl evel 4Yet..t ge i eae (7) 
0 ij...k (37,1)! 

if the vy, are even. (7) may be evaluated by termwise integration when the integrand has 

been multiplied out. 


3-1. Case of equal numerator degrees of freedom 

We consider the following experimental designs of size | x1: (i) the randomized block 
experiment with equal num_.ers of treatments and blocks, (ii) the Latin square and (iii) the 
Graeco-Latin square. Actually, the use of the maximum F-ratio is perhaps even more 
important in factorial experiments in which a rather larger number of ‘treatment’ mean 
squares is tested against a common error mean square. But the designs chosen will serve to 
illustrate certain interesting points and have the convenience of allowing a fairly systematic 
tabulation. It should also be noted that in the sequel all numerator mean squares are 
treated on terms of equality, although it is often more appropriate to apply the method only 
to those mean squares corresponding to factors of real interest and not to ‘extraneous’ 
factors such as block effects. 
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Table 2 shows the probabilities, obtained from (2) and (7), with which the ordered F-ratios 
Fyyexceed the 5 and 1 %, significance levels F, of the corresponding random F-ratio. As is to 
be expected, the term Pr(F,,,>F,) accounts for by far the largest portion of the ‘total 
Type I error’, viz. 2a, 3a, 4a, respectively. This term increases with / for any given design. 
The oo-values shown are identical, for any t, with the probabilities obtained by assuming 


Table 2. Showing the probability, for various designs of size | x 1, with which the ordered 
F-ratio Fy exceeds the upper percentage point F, of the corresponding random F-ratio 


























| | a = 0-05 | a= 0-01 | 
| stag es Soc Seca aes oe bate eee _| 
ee z= | | 
Randomized Latin | = Randomized Latin eos | 
block square — block square | —_ | 

| | square | | square 
or 5 ae. | Bag ah | 
Sake al | | pierre oer n ie ob | 
3 1 | 0-0842 0:0903 | — | 00172 00183 | — 
2 | 0-0158 0-0424 | _ 0-0028 00084 — | 

E ate 00172 | — — | 0-0034 | on 

5 1 | 0-0921 01238 | 01398 | 0-0191 | 00-0265 | 0-0306 

2 | 0-0079 | 0-0230 | 0-0434 | 0-:0009 =| 00031 | (0-0071 

. dui | 00032 | = 0-0137_ | — 0-0003 | 0-0019 

4 | _ = | 0-0031 = — | 00004 

| | 

7 | 1 | 0-0944 0-1321 | 01623 0-0195 0-0283 | 0-0359 

LoS | 0-0056 0-0165 | 0-0315 00005 , 00016 | 0-0036 

3 — 00014 | 0-0055 — | 00001 —  0-0004 

| 4 — — 00006 ~- | _ | 0-0000 

9 | 1 | 0-0953 | 0-1356 | 00-1706 0-0197 | 00288 | 0-0374 

2 | 00047 | 00-0136 =| ~—(0-0259 0-0003 0-0011 | 0-0024 

| $4 — | 00008 | 0-0032 — 0-0000 0-0002 

} 4 | — -- |  0-0002 — | — | 0-0000 

Ce ee | 0-0975 01426 = 01855 0-0199 | 0-0297 | 0-0394 
|} 2 | 0-0025 0-0072 | 0-0140 00001 | 00003 | 0:0006 | 
| 3 | = 0-0001 0-0005 — | 00000 |  0-0000 | 

be Rl _ a 0-0000 — — | —0-0000 
| | | 





independence among the variance ratios (Hartley, 1938) and apply to any n-factor design 
(n = 2,3, 4) with large error degrees of freedom. 

In Table 3 we give the upper 5 and 1 °% points of the maximum F-ratio in the above eases 
and also for / = 6, 8. Below each exact figure is shown the 100a/n % point of the corre- 
sponding random F-ratio, this being a simple conservative approximation due to Hartley 
(1949). The ‘exact’ values for / = 6, 8 were in fact inserted by interpolation with the 
approximation as an auxiliary function, and may be in error by a few units in the second 
decimal place. If the error degrees of freedom pv are noted it will be found, as might be 
expected, that Hartley’s approximation increases in accuracy as v increases (n constant), 
and as n decreases (v constant). The usefulness of the approximation, except for very smail v, 
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is obvious for the cases treated in Table 3. Moreover, in many other situations (including 
the case of ‘extraneous’ factors mentioned above) the demands made on the approxi- 
mation will be less severe. 

If the approximation is accepted avery simple procedure permits the complete analysis 
of the experiment (Hartley, 1955): 

Test Fy) by referring to the 100«/n level of the corresponding F-ratio. If Fy) is not 
significant stop, if significant test F(.) at the 100«/(m— 1) level, ete. 


Table 3. Upper 100« percentage points of the maximum F-ratio followed by the 
corresponding 100 a/n °%, points of F for the n-factor designs of Table 2 




















n a= 0-05 | a=0-01 
| 
oa ae ee ee 
l 2 : 
| \| | 3 4 | 3 4 
r te | —— - aS 
i 3 9-67 35-4 a> |}, er Pau toe 
10-65 49-0 am 26-3 200 | — 
5 366 447 | 586 | 559 | 7-05 10-2 
| 373 | 467 649 | 564 7-23 10-9 
| | 
6 636 | 809 3-55 (4-09)* 440 | 510 (6-10)* 
3-13 3-64 (4:31)* 4-43 5-17 (6-26)* | 
7 | 276 | 3-08 3-41 378 421 en | 
| 278 3-13 3-51 379 | «22 | «76 | 
| | | | | 
| 
gs | 218 2-79 301 | 339 | 368 400 | 
2-20 2-82 3-07 339 | 370 4-03 | 
| 9 2-38 2-60 276 | 310 3-34 356 | 
| 240 2-62 280 | 310 3-35 3-58 | 
| | | 








* These values must, of course, be interpreted without reference to an experimental design as no 
6 x 6 Graeco-Latin square exists. 


3-2. Generalization to unequal numerator degrees of freedom 


The F,,,x. test is intuitively at its best when all numerator degrees of freedom y, are equal. 
In the more general case it is still possible to determine exact significance levels of F,,,, by 
means of (7) provided all the v, are even. However, the probability with which the tth factor 
is significant by chance is then no longer independent of ¢ but larger for the smaller y,. This 
difficulty may be overcome by having, for any given design, a specific significance level 


F(a), say, for the tth F-ratio F,, subject to the following conditions: 


(i) probability that F,exceeds F(a) is independent of ¢; 
(ii) probability that at least one F,exceeds F(x) is a. 


These conditions result in a set of equations which may be written down by means of (1) 
and (7) and solved numerically. 
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Example. In the case of a v,+1 by v.+1 randomized block experiment F,(«), F(a) are 
the solutions of 
(i) Pr (sj/s?> F,) = Pr (s}/s* > ¥), 
(ii) 2Pr(s?/s? > F,) — Pr (s?/s? > F,, 83/s? > F,) = a, 


where s?, s3 and s* are mean squares based on 1, v, and v,V, degrees of freedom, respectively. 
For pv, = 8, vg = 2 and a = 0-05 we obtain F,(0-05) = 3-07, F,(0-05) = 4:58. These values 
may be compared with the 24 % significance levels of the corresponding F-ratios, 3-12, 4-69. 

An adequate tabulation would, of course, be feasible only in certain simple cases. But in 
many practical situations it will be sufficient to test the largest variance ratio against the 
upper 100a/n % point of the appropriate F-ratio. When special accuracy is desired the 
approximate values will help in the numerical solution for the somewhat smaller exact 
percentage points. Reference should again be made to Hartley (1955). 


I am indebted to the referee for a number of valuable comments on the first draft of this 


paper and to Miss B. C. Halliburton of the C.S.1.R.0. for her careful computation of the 
various tables. 
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xy? PROBABILITIES FOR LARGE NUMBERS OF DEGREES 
OF FREEDOM 


By JOHN WISHART 
Statistical Laboratory, University of Cambridge 


Probability values for y? are given in Pearson & Hartley (1954, Table 7) for y? up to 120 and 
for degrees of freedom (v) not greater than 70. This means that the complete range is not 
covered beyond about yx” = 40, and no values are available for really high y?. Casting about 
for a suitable expansion which would make tabulation in this range feasible, the author 
first had a look at Karl Pearson’s modal expansion in terms of incomplete normal moment 
functions (Pearson, 1922, p. xix), a formula which may alternatively be written in terms of 
certain y? probabilities, or as a polynomial in a related variate x’, combined with chosen 
values of the normal probability integral and ordinate corresponding to 2’ (P(X) and Z(X) 
in the notation of Pearson & Hartley). Karl Pearson calls attention to the slowness of 
convergence of his formula, and evidently did not recommend it highly for use beyond the 
tabulated range of the Incomplete Gamma-function Table. 

Going back to first principles, we shall find it convenient to adopt the Pearson—Hartley 
notation of using c for $v. The frequency function of y = x?/v = }42/c is 


ce 
—1 p—c , 
re)” ev (0<y<o), 
from which that of z = $lny is 


2c¢ Mee p-c D~\2 22)3 
T°"? {c(2z—e*)} = + exp| — eS) +E + .||(-co<e<eo). (1) 


¢) 
Now write x = 2z,/c = ,/(4v) In (x?/v). This leads to the frequency function for x 








i a eS 
7 Tc) (27) P \3!c05 " 4te* SlclS " 6lez J 
aaa e—tz* | 1 = 4” —3at 5x —45a7 + 5425 4 5a? — 90x! + 351428 — 216a% \ (2) 
— “O(a Gees" 12 6,480ch5 155,520c? ig neces 
expanding as far as terms in c~*, to this point the value of x being 
i 1 


| Tet D880" 

The range of x is from —c to +00, and x = 0, while not the mean of 2, corresponds to 
x? = v, i.e. to the mean of x. Effectively we are transforming x? by replacing the ratio of 
x? to its mean, raised to the power of its standard deviation, by e”. 

To obtain x? probabilities we require to integrate f(x) dx from —0o to X, where X may be 
positive or negative, or from X to oo. The answer can be given in more than one form. It 
may be put in terms of the incomplete normal moment functions 


Ht,(X) = - am * are-bde, 
. V(27) Jo 


ig Hn(X) 
m,(X) = (n—1)(n—3)... 1 or 2’ 
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according as n is even or odd, for comparison with Karl Pearson’s modal expansion. We 
then get 
Formula A. X = ,/c\n (x?/(2c)): 
1 1 X) 
t f(x) dz = {1 - ide + saeail| (X)— sl ie o eame(X) — xm,(X)! 
1 
~e {g7m9(X) — 3m,(X) + 45m5(X)} 


+o fa¥ 5M49(X) — § 3M o(X) + Heat m,(X) — dem X)}| (3) 


as far as terms in c~*. Values of m, to m4, are given to seven decimals in the old Tables 
(Pearson, 1914, 1931), while ~.(X) = P(X)—0-5 and P(X) is readily available, either in the 
old Tables or in the new Biometrika Tables. 
From this formula we can derive a very useful expression for the chance of x? being less 
c 
than, or at most equal to, its degrees of freedom v. For this will be | f(x) dx, obtained from 


Formula A by putting x = —oo, and noting that 


Mo,,4(—00) = —(27)-* and mg,(—00) = 0°5. 
Formula A’. 
a oe 1 1 4 l 
Ne Ie Jd =(1— 55, , ee 2 + Jn) 30° * B40 * (am say 1a + areas 
1 1 
Lg) 2 ec! Ra. Y 4)* 
ees vats) (4) 


This formula will give seven-decimal accuracy for c of the order of 50 or above, i.e. beyond 
the range of the Incomplete Gamma-function Tables (the p of that table is our c—1). 
Adding or subtracting (3) and (4) gives the required x? probabilities. 
Alternatively, we may write in (3) 
(27) Mo,44(X) = P(X? | 27+ 2), 
2mo,(X) = P(X? | 2r+ 1), 
where P(X? | v) denotes the probability that x? does not exceed X?, for v degrees of freedom. 
These probabilities may be obtained to five decimals by subtracting from unity the x? 
probabilities given in the Biometrika Tables, Table 7. 
By integrating the terms of (2) by parts from —oo to X, we may get the probability 
that 2 does not exceed X in the form of an expansion in powers of X, involving 


1 x 
P(X) = =. | ete de 


and Z(X) = — 


Multiplying in by the outside factor we get our result in the form of 
Formula B. X = ,/cln (x?/(2c)): 
pX2+2 X54+2X3+6X 5X8—5X%+ 24X44 6X%+ 12 
“fe pee oe ENA + AD ae 72c 7 6,480cr% 
5X — 35X® + 36X7— 144X5— 180X3—540X| (5) 
- . . jr & 


155,520c? 


* An additional term (inside the bracket) is — 25/(2,016c?). 
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Now the well-known Cornish—Fisher expansion (Cornish & Fisher, 1937) applies to the 
general z function. We may adapt this to our special case by writing n, = v = 2c, n, = ©. 
In the formulae for a, b, ...,f on p. 30-13 of the reference, we have 6 = o = (2c), and if we 
then substitute in the formulae on p. 30-9 we get Formula B above, Cornish & Fisher’s p, 
z and € being replaced by P(X), Z(X) and X respectively. Thus the expansion we have 
obtained is not a new one, but is a special case of a direct Cornish—Fisher expansion in which 
Ny is allowed to go to infinity. Our derivation of this expansion from the frequency funetion 
of x” by a few lines of algebra is, however, believed to be new. 

The best-known form of the Cornish—Fisher expansion is the inverse one, namely, a formula 
which determines z, in terms of X, where the chance that z is less than, or at most equal to, 
% is equal to P(X). A chosen value of P(X) determines X from normal probability tables. 
We can therefore get the inverse form to Formula B by putting 6 = o = (2c) in the 
formulae on p. 30-14 of the reference cited, but note that in the penultimate line 64 should 
read 6?. With z = }1n (x?/v) we get 

Formula B’. 


a. a- 


a ee gee toe a Pa 6 
0 = 3608- 12¢* 72h 3,240c? 77,760c?> (6) 








Corresponding to a chosen value of the probability that y? be not exceeded, we then obtain 
the corresponding x? as ve*o from tables of natural logarithms. 

A footnote to Table 8 in the Biometrika Tables (Pearson & Hartley, 1954) gives two 
approximations for v > 100. One which is practically as good as the Wilson—Hilferty cube- 
root approximation, and better than Fisher’s ,/(2x?), can be obtained from the first two 
terms of Formula B’ by putting In (x?/v) = X ,/(2/v) — (X? + 2)/(3v). Equally, a fair approxi- 
mation to the probabilities beyond the range of Table 7 may be got from the first two 
terms of (5). 


Examples. 
(1) v = 98, c = 49. Let xy? = 98. Use Formula A’. 


P(x? < 98) = stayen(!+ a0) 
= 0-5189 994. 
From the Incomplete Gamma-function Tables 
u = x?/,/(2v) = 7, p= 48, I(u,p) = 05189993. 
(2) v = 100, ¢ = 50. Let xy? = 124-342. Use Formula B. 
z = $1n 1-24342 = 0-108933, X = 2z,/c = 1-5405 453, 
P(x? < 124-342) = 0-9382 8625 


+ 125 5249 
a 8 5353 
+ 1346 
+ 98 
0-9499 996 


Biometrika Tables (Table 8) give 0-05 point for y? as 124-342. 








) the 


= 00. 
if we 
"8 Dp, 
have 
hich 
‘tion 


nula 
to, 
bles. 

the 
ould 


(6) 
tain 


two 
ibe- 
two 
Oxi- 
two 








JOHN WISHART 95 
(3) vy = 100, c = 50. Let Q(x?) = P(x? > x2) = 0-05. Use Formula B’. 
X = 1-6448 5363 
2) = 0-1163 0872 


- 784257 
+ 49790 
-_ 3229 
e 155 





0-1089333 giving x2 = 124-3421. 
(4) v = 100, c = 50. Let y? = 77-9295. Use Formula B. 
z= —0-1246828, X = —1-7632811, 
P(x? < 77-9295) = 0-0389 2655 


+ 1015025 
+ 90351 
+ 2132 
é 123 
0-0500 004 


Biometrika Tables (Table 8) give 0-95 point for x? as 77-9295. 
(5) v = 100, c = 50. Let P(x?) = P(x? < x2) = 0-05. Use Formula B’. 

2) = — 011630872 
— 784257 
— 49790 
- 3229 
_ 155 
—0-1246830 giving x? = 77-9295. 





The urge to undertake this investigation might have been lacking had Slutskii’s Tables 
(195°) been available earlier to the author. For Slutskii tabulates the y? probabilities 
corresponding to t =: ,/(2x”)—./(2v), first for v = 6 to 32 (Table III) and secondly for 
v(2/v) = 0 to 0-25, i.e. for v = co down to 32 (Table IV). The tables are given to five decimal 
places, and Table IV, in particular, is a very useful supplement to Pearson & Hartley’s 
Table 7. Formulae A and B above may, nevertheless, be useful for tabulating x?/v pro- 
babilities to considerable accuracy, or, in abbreviated form, for obtaining approximate 
probabilities without the use of tables. They are also interesting in their own right. They are 
special cases of a general expansion which the author has since worked out, and which will 
be given in a separate paper. 
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THE SAMPLING DISTRIBUTION OF A 
MAXIMUM-LIKELIHOOD ESTIMATE 


By J. B. S. HALDANE anp SHEILA MAYNARD SMITH 
University College London 


The method of maximum likelihood was used by Edgeworth (1908) and others. Since 
Fisher’s (1921) paper, which gave a general expression for the variance of an estimate by it, 
it has been very widely used. However, no serious attempt to determine the form of its 
sampling distribution when the number in the sample is not very large was made in the 
next thirty years, though Hotelling (1930) and others improved the rigour of earlier proofs. 
Haldane (19532) gave its bias, but gave an incorrect expression for its third moment. These 
were, however, obtained as special cases of a more general theorem. The paper is, moreover, 
hard to follow owing to numerous misprints, and several errors, some, at least, of which are 
here corrected. In this paper we obtain the first four moments of the distribution by a more 
direct approach. 

Throughout we consider the estimation of a single unknown parameter &. Let 2 be the 
maximum-likelihood estimate of £, based on a sample of N members. Before considering 
samples from a continuous distribution, we suppose that the population sampled can be 
classified into a finite or denumerable set of classes, that the number of individuals in the 
sample falling into the rth class is n,, and that the expectation of n, on some hypothesis is 

é(n,) = Nf,{&). 
The permissible range of values of & is that for which no f,(&) is negative. We assume that 
over this range, which may consist of several discrete parts, every f,(€) is regular. In 
particular, we assume that, even at the boundaries of this range, £ remains finite. 


If this last restriction is not made, some or all the moments of the sampling distribution 


of x are infinite. Thus if 
, 


. 
E(n3) = ay E(N) = or then x= a 

Hence &(2) is infinite, since n, has a finite probability of being zero. In such estimations the 
maximum_-likelihood estimate is seriously biased even if cases where one class is empty are 
excluded. Thus in the above example = still has a bias + N-1£(1+£)+O(N-*) when cases 
with n, = 0 are omitted, whilst x’ = n,/(n.+ 1) has a bias tending to zero more rapidly than 
any negative power of N. This criticism applies to some other efficient estimators, for 
example, the ‘product method’ which Fisher (1954) recommends for linkage estimation. 


If 
m 
4) = (*) a -er, 
then « = }}rn,/(mN), and its sampling distribution is binomial, with cumulants 
Tr 
Ky =§&, K,=N'(1—£), ks = N~*E(1—£) (1 — 26), ete. 
In this case, the maximum-likelihood estimate is absolutely unbiased. The same is true for 


a Poisson distribution, or for a distribution in which h + ké is substituted for £ in f,(€) above. 
We conjecture that it is biased in all other cases except for special values of £. It will be 
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shown that in the important case where expectations are linear functions of £, that is to say, 
SAE) = h, +k, £, the bias is of order N-*, and thus usually negligible, whereas in other cases 
it tends to zero with N-!. 


SYMBOLISM AND PRELIMINARY LEMMATA 
Let fAE) =, frlE)=6, FE=¢, f'(E)=d, f(&) =e. 
Since >) f,(w) = 1 for all u, it follows that Ya, = 1, } 6, = S}¢,ete. = 0. Clearly a,, b,, ete., 
are nine over any series of samples ronh a give sopialaalion:; though their values may 
have to be estimated. 


Let « = £+y, so that y is the error of the estimate. 
Let n, = N(a,+z,). 


Let A,=>5a-t", B= Tae te, C= da ea, 
r r r 
D; = Xa; ‘bid,, S;= Yhia,, where h, is arbitrary. 
¢ r 
Let a= =a, %; 2, B; i = 4; *bf-"c,2,, 6; i = 4; *bj-d,z,, 
rT r i 
O6=a,—f,, A= 2a,—3f,+4,. 


Suffixes will be dropped when the meaning is clear. Thus Za~bz means ¥) a> 1b,z,, and so on. 
e 


It is at once obvious from Taylor’s theorem that 


f,(%) = a, +b,y + 3¢,y" + 3d,y8 + ~ 


l 
file) = b, rey -+4d,y?+ 4e,9°+... ) 


The symbolism is that of Haldane (1953a). The quantities A;,etc., are connected by 
simple relationships such as 


54s = —1A,,,+ (+1) B;, 5B = —1B,,, +00, + Dj. 

A, and C; are positive when 7 is odd. A; becomes large when any a, is small, or any b, large, 
that is to say, for values of £ for which any /,(£) is small, or is changing rapidly. B; and D; 
are usually large under the same conditions, and when a first or second derivative of any 
fAé) is changing rapidly. Such a term as A;1A,, which occurs in the expression for the 
skewness of the distribution of x, is usually large if any f,(&) is small, and so on. 

In order to obtain expressions for such expectations as &(a{) and &(a?6) we require 
general expressions for the expectations of powers and products such as (Zh,z,)* and 
(Zh,z,)® Zk,z,, where h, and k, ere constants. First consider the sampling distribution of 
dh,z,. Thea if S; = Y hia, it is easy to show, from the moments of the multinomial dis- 

r Tr 


tribution, that the expectation of ¥h,z, is zero, and its first few moments are: 
r 


fz = N-"(S,— S83), 

Hy = N-*(S;— 38,8, + 253), 

fg = 3N-*(S, — S2)? + N-9(S, — 48, 8, — 383 + 1287S, — 6S), (2) 
fs = 10N-*(S, — S}) (S3— 38,8, + 283) + O(N), 

jig = 15N-%(8,—S3)° + O(N-*). 


9 Biom. 43 
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For example, 
6(2) = N-*a,(1—a,) (1—2a,), 
&(z2z,) = —N-a,a,(1 —2a,), 
6 (2,2,%) = 2N—*a,a,a,. 
(XA,z,)® = UDh8z3 + 3UNZzh,22z, + 6XN,h,h,z,2,% 
So &[(Zh,z,)?] = N-*[Dh3(a, — 3a? + 203) + 8UAZh,( —a,a, + 2a2a,) + 12Uh,h,h,a,a,a,] 
= N-(dh3a,—3zh,a, Uh? a, + 2(Xh,a,)>] 
= N-*(S,— 38,8, + 283). 
The leading terms of 4, ~, and ~, can be derived directly from y, and v3. From (2) we 
can at once obtain the expectations of powers. For 


= Da>d,z,, so h, =az%b,, 
af 


and S,=0, S,=>Da; =A, 8,=A, 8,=As. 
So from (2), 6 (at) = 3N-2A? + N-9(A, —3A2). 
To obtain expectations of products, the expressions (2) must be expanded if necessary. 
For example 
fs = 2N-3(5S,S, — 5S? S, — 158,83 + 2583S, — 108?) + O(N). 
In 6[(2A,z,)* (Zk,z,)] one of the five h’s in turn is replaced by k,. Thus 
&[(Zh,z,)* (Lk,z,)] = 2N-3 [3(S, — 8?) (Bh*2ka — 8, Xuka) 
+ 2(S,—6S8,S, + 5S?) (Zhka — 8, Xka)] + O(N-*). 
From such expressions we immediately evaluate expectations of products. Forexample, in 
atO = af(a,—f;), h=a-b, k=a-*b?-a-e. 
So S,=0, S,=A, S,=A,, Xhka=A,—B,, LXh*ka=A,-—B,, Xka= Ay. 
The required expectations are as follows: 
G(a,) = 0, } 
a?)=N-1A,, &(a,0) = N-\(A,—B,), 
é er = plligan &(a#20) = N-*(A,—B,— A?), 
4 
1 


& (030) = 3N-2A (A ~ B,)+O(N-), 
&(a3A) = 3N-2A,(2A,—3B, + D,) + O(N-3), 
6 (a3 0") = N-*[2(A,— B,)? + Ay(A3— 2B, +C,)— Aj] + O(N), > (3) 
i: 5) = 10N-%4, A, +O0(N-*), 
$0) = 2N-M244(4,—B,) +3A,(Ag—B,)— 343] +O(N-), 
‘a 6) — 15N-343+0(N-*), 
&(a80) = 15N-34%(A, — B,) + O(N-), 
G(aRA) = 15N-2A2(2A,— 3B, + D,) + O(N-*), 
&(a$0*) = 3N-*A,[4(4,— By)? + A,(A— 2B, +C,)— 3] +0(N-),] 











ary. 


le,in 


(3) 
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Haldane (1953a) gave an incorrect expression for &[(Xhz)? (Xkz)?] in his equation (4). His 


equation (6) should read 
& (ai9,) = N-*(2A,+ A, H)). 


CALCULATION OF THE MOMENTS AND CUMULANTS 


The maximum-likelihood estimate is a root of 
= 2,f;(x) (f,(x)]* = 0. (4) 


There is rarely if ever any doubt as to which root is the estimator. Substituting from (1), 
provided that a, is not zero, and summing, we obtain 


a, —(A,+0)y+4(2A,—3B,+A) y?—}(6A,—12B,4+30,4+4D,)y*+...=0. 
These terms are sufficient to calculate the fourth moment of y to order N-*. The first term 


tends to zero with N-}, while the coefficients of powers of y tend to constant values with 
large N. So when N is large we may invert this series, which is the expansion of 


d 
a ag, Pee 
zor F ing E+y), 
by Lagrange’s theorem. 
We thus find 


y = Aja, + Ay*[(A,—3B,)a,—A,O] a, 
+ Az *[{2(A,—3B,)? — A,(A3— 2B, 4+ 4C, + 3D,)} a4 
— 3A,(A,—$B,) 00+ $Aja,A+ AjO"]a,+ O(N). (5) 


From this fundamental equation and equations (3) the moments of the distribution of 
y are calculated. In order to evaluate x, we need two terms in the expansion of /4». 


Ey) = —4N-4j°B, + O(N). 
This is the bias of x: 


&(y?) = N-1Az14 N-*A;4[ — A3+15B3 + A,(A,—B,—D,)— Aj] + O(N). 
Subtracting [F(y)]’, 
fa = N-1Az14+ N-2A;4[ — A3+3B3 4 A,(A,—B,—D,)— Aj] +O(N~). 
Thus A, is the amount of information about £ per individual in the sample: 
&(y®) = N-*Ay*(A,—3B,) + O(N). 
Subtracting 36(y) &(y*) — 216(y)I°, 
fs = N-*Az*(A,—3B,)+O(N-*), 
6(y*) = 3N-2Az? + N-8475[ — 643-144, B, +195B2 + A,(7A,—6B,—10D,)— 947] 
. +0(N-*). 
Subtracting 46 (y) &(y*) — 6E(y*) [E(y)P + 31 (y)I*, 
My = 3N-2Az? + N-Az5{ — 642-124, B, + 45B2 + A,(7A,—6B,— 10D,)—9Aj] + O(N), 


Ky = fy— 3p3 
= N-A;5[ —12B,(A,—2B,) + A,(A,—4D,) — 3Aj]+ O(N). 
7-2 
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Thus the first four cumulants of the distribution of x are: 


K, = §—3N-14;°B, + O(N), ) 
Ka = N-'Ay*+ NA; “[ — A§ + $Bi+ A,(A;— B,— D,)— Aj]+ O(N), (6) 
k, = N-*Az*(4,—3B,) +O(N-), 
k, = N-®A;5[ —12B,(A,—2B,) + A,(A—4D,) — 343] + O(N-), 
whence y, = N-+4;*(A,-—3B,)+0(N-), 


Ye = N-14z*[-12B,(4.—2B,) + Ay(As—4D,) — 349] + OW). 


If every f,() is linear, that is to say, of the form k,+6,£, then every c, and d,, and hence 
every B,, C; and D; vanishes, and the maximum-likelihood estimate is almost unbiased. It 
has, however, a bias of order N~*, evaluated by Haldane (1953a) and given below. This can 
perhaps always be neglected. The cumulants of the distribution of x are: 


k, = §+2N-*47 (43-24, 4, A, + APA) + O(N), 
Ky = N-1A;!— N-*A;4(A3— A, A, + AB) ™ O(N-%), | 


o —3 —3 AT—3 (7) 
Kg = N A; A,+O(N )s 
kK, = N-8A;*(A,— 3A?) + O(N-‘), 

whence "1= N-+A;#A,+0(N-4), 


Y2 = N-"(Ay*A3;—3)+O(N-~). 
Thus y, is positive near the limits of the range of £, where at least one value of a, is small, 
and hence A;j"A, is large. It may be negative in the middle of the range. 


Now in practice the constants A;, B;, etc., are unknown. They must be estimated from z, 
the estimate of €. If we put A} for the maximum-likelihood estimate of A,, namely, 


~ [fe P Lf (@)7, 


then A; = A,—(A,—2B,) y+ (A3—3B,+0,+D,)y?+.... 
So &(Aj) = Ay + NA; 4;'B, ($A, — B,) + Ag— $B, +C,+D,]+ O(N), 
and similar expressions can be found for other estimates. Thus (6) and (7) remain true when 


A; is substituted for A; and so on, except that the second term in the sampling variance has 
no validity. 


We may also use the fact that f,(~) = N-1n,+ O(N-") and use the approximations 
At = N'Enz[fi(ayy, By = NEnz [fila }ifeee), 
Cy = Nine The) @)P, Di = Nine fa) fF" (@) 
for A,;, B;, C; and D,; in expressions (6) and (7) without introducing serious errors provided 
no n, is small. 

We have so far neglected the possibility that any a, is zero. This can only be the case if, 
in equation (4), at least one n, is zero, though it may be necessary that several n,’s should 
vanish. Thus if 

fi (2) = 2%, f(x) = 2, f(x) = 1-2-2%, 
equation (4) gives 
(ny + 2Nq + 2g) x* + (Ny + 2ny+ Nz) xX — (nN, + 2nN,) = O, 


and since no n, can be negative, x can only be zero if n, = n, = 0. 
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If some @,, say 4, is zero, then 7, is zero. If no other a, is zero, (4) becomes f,(x) = 0, and 
the relevant solution is « = £, where f,(€) = a, = 0. That is to say, the estimate is absolutely 
unbiased. Since a, = 0, A, is infinite, that is to say, x is not only an unbiased estimate, but 
a precise estimate. However, we can never know that this is the case on the basis of a finite 
sample. The numbers A,, B;, etc., all become infinite if any a, vanishes, and the estimates 
of them become meaningless if any f,(x) is zero, and unreliable if it is sufficiently small. This 
is of course paralleled in the case of simple sampling when no members of a sample possess 
some attribute. This problem has been discussed since the time of Laplace, and a further 
discussion would be out of piace here. To sum up, if there is a set of one or more values of 
n, whose expectation is zero for some value of £, then ifall these values are zero the expressions 
found are inapplicable, and if their total is small (say less than 5) they become very 
inaccurate. 

So far we have supposed that the sample is a sample of a discontinuous variate. Let us now 
consider a continuous variate p, whose distribution is given by: 


dF = f(p,&) dp, 


where f is a known function, and £ an unknown parameter to be estimated. Suppose that 
in a sample of N members p has assumed the values 7,, Po, ..., P,, ---; Py, then the maximum- 
likelihood estimator is 


N 
PEA Py» x) Lf( Pr: x)} = 0, 


and we have only to put 
‘ , , oe , 3 
a, =f(?,. x), b= ad (Pr x), C= aad (Pr X), d, = 5,3 (Pr x), 


and A; = Xa)-‘b/'*+1, etc., to be able to use formulae (6) and (7). It follows that the maximum- 
likelihood estimate of the mean of a normal distribution is absolutely unbiased, and that if 


f(y», &) is of the form Ef,(p) + (1—£) fol), 


where f,(p) and f,() are known frequency functions, the maximum-likelihood estimate of & 
has a bias tending to zero with N-*. In practice, unless N were quite small, we should group 
our data, and f(x) would be f(p,,2), where p, is the central value of p in the rth interval. 


DIscUssION 


We first correct an error of Haldane (1953a). In his equations (12) the expression for the 
third cumulant of the generalized estimate should read 


kK, = Aj*N~[A, —3B,]+O(N-*). 
Since x, is independent of k the skewness of the distribution of the minimum discrepancy 
estimate (k = +1), or of the estimate found by putting k = 2 so that 
Lln,(m, + f(x) {f.(e)}*] = 0, 
> 


which is nearly equivalent to minimizing y?, is the same as the skewness of the maximum- 
likelihood estimate. 

The statistical importance of a knowledge of the distribution of x can best be judged from 
examples. Let us consider the case where 


fila) = 4(24+2), fox) = 41-2), fg(x) = je, 
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which arises in linkage estimation from F, data, and has been extensively discussed by 
Fisher (1954) and Haldane (1953a). The maximum-likelihood solution is the positive root of 
Nx? —(n,—2n,—Ng) %— 2n, = 0, an alternative efficient estimate by the method of minimum 
discrepancy (Haldane, 1953a) being 


_— (4g + 1) (2m, — 2g + 1) 
(my + 1) (my+ 1) +4(my + 1) (mg +1) + (mg +1) (m3 +1) 
Here B,, C;, etc., are zero and we can use (7): 
A, = $(14+ 2x) [a(1—x)(2+2)}", A, = $(2— 24-52? — 42%) [a(1 —x) (2+2)]-°, 
A, = 3(4— 6x — 3x? + 14a3 + 1224 + 62°) [a(1 — x) (2+2)]-*. 
Hence the bias of x is 72”(2+a)(1—a)(1+2ax)->N-*, which becomes + 1442N-? when 
x is very small, and + $(1—a) N-? when z is nearly unity. It is always negligible. 
The variance, as is well known, is 2a(1 —x) (2+) [(1+ 2a) N]-! and 
Vy = (2— 2a — 5a? — 423) [4x(1 —x) (1+ 2x)3(2+2) N}-, 
Vo = (S— 18a — 27x? + 1943 + 48x4 + 2425) [x(1 — x) (2+2) (1+ 2x%)?]1 N-1. 
Thus y, changes sign when x = 0-418. y, is negative if x lies between 0-353 and 0-611. Thus 
over this range the maximum-likelihood estimate is rather more precise than it would be 
were it normally distributed. When x or 1—z2 is small, it is less precise. When x = } and 
N = 100, y, = 0-0474 and y, = — 0-0065. 
As a non-linear example consider the case where, of NV organisms subject to constant 
mortality, n, have died during the first of three equal intervals, n, during the second, and n, 
during the third, while n, survive. If £ be the probability of surviving through an interval, 


fiz) =1-2, f,(z)=a(1l—2), f(x) =2°(1—2), f,(x) = 2. 
Thus one at ; 
N, + 2N2+ 3ng+ 3N, 
A, = (l+%+2?)[a(1—a2)}-"!, A, = (1+ 24+ 22? — 82%) [2(1—2z)]-°, 
A, = (1+ 8a + 9a? — 5823 + 4324) [a(1—a)]-3, By, = 2(1+ 22) [a(1—2)]-, 
B, = 2(3+ 7x — 132*) [a(1—2)]-*, D, = 6[a(1—2)]}-. 
Thus the bias of x is —a(1 —a) (1+ 2a) (1+2”+2?)-* N-!, which is always negative, and the 
variance 2(1—2)(1+a+2?)-! N-1. Thus if = }and N = 100, the bias is — 0-00163 and the 
standard error 0-0378. It is barely worth while to correct for a bias of 4% of the standard 


error. y= [Na(1 os a3)}-4 (1 +2 +2?)-1 (1 — 4a — 442+ 423), 


which changes sign when x = 0-214, and is only — 0-13 when x = } and N = 100. 
Yo = N-(1+a+2%)*[a(1—2)}4 
x [1 — 18a + 12a? + 11823 — 57x4— 8425 + 2825 + 627 + 328] + O(N-*). 
Y2 is negative if x lies between 0-059 and 0-351. Over the rest of the range it is positive, and 
the maximum-likelihood estimate is somewhat less precise than a normally distributed one. 
When 2 is small y, approximates to (Nx)-1, and when 1 —2 is small, to [3N(1—)]-!. When 
x = }, Yo = 1039/(343N) +O(N-*). Thus when x = 3 and N = 100, y, = + 0-0303. 
Haldane (19536) considered the estimation by maximum likelihood of several unknown 
parameters, and obtained an expression for the bias when two were estimated, as well as 
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confirming the known expressions for their variances and covariance. The method used was 
the same as that employed in this paper. The calculation of the higher cumulants of the 
joint distribution of estimates is clearly, however, very complicated. 

Fisher (1921) showed that if ¢ is any regular monotonic function, and z is the maximum- 
likelihood estimate of £, ¢(x) is the maximum-likelihood estimate of (£). We can now ask 
the further question: For what functions is ¢(7) an almost unbiased estimate of ¢(£)? 
Haldane (1953a) defined an almost unbiased estimate as one whose bias tends to zero with 
N-~ or more rapidly as N increases: 


P(x) = J(E+y) 
= $(£) + y'(E) + dy°h"(E) +... 
So E[(x)] = H(E) + $'(E) &(y) + 30"(E) Ely”) +... 
= $(£)—3N 14; °B, ¢'(€) + 3N 1 Az*6"(E) + O(N). 
So the condition is 


ae 
#6) ~ 4t Br 


Hence d(é) = fexp | fartBsae| dé. 





For example, in the linear case here considered x is an almost unbiased estimate of £. In 


the non-linear case 


Azr1B, = 2(1+26) O(E) = E+ FE(1 +24 E24 4E3 4 1E4), 


where £ and F are any constants. 

We shall briefly discuss the value of unbiased estimates in the branches of biology with 
which we are familiar. If the estimate, so to speak, stands by itself, a maximum-likelihood 
estimate is probably as useful as an unbiased one. The same is true when a function of the 
estimate will later be tabulated. If, on the other hand, the estimate is to be one of a large 
collection, say gene frequencies in populations, or relative viabilities of phenotypes, it is 
undesirable that the estimates should be consistently high or low, on the average, since 
a mean or median based on a large number of such estimates will have a bias which in some 
cases could be of the order of magnitude of its own standard error. It is therefore desirable 
that estimates which are subsequently to be tabulated should be unbiased. 


SUMMARY 


Expressions are given for the first four cumulants of a maximum-likelihood estimate. The 
bias is only negligible in some such estimates, when the sample number is of the order of 100. 
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TESTS FOR RANDOMNESS OF POINTS ON A LINE 


By D. E. BARTON anp F. N. DAVID 


University College London 


1. In previous papers (1956 a, b) we have discussed the distribution of various functions 
of ordered intervals any of which might be used as a test for the randomness of points on 
a line. In this present note we proceed a stage further and discuss approximate distributions 
of sums of ordered and unordered intervals. These distributions can be used to test the 
randomness of points on a line or as bivariate tests for randomness. The power function of 
any, or all, of these tests with respect to a specified alternative can be obtained to a similar 
degree of approximation and we propose to describe these in a further note. 

2. It is possible to rank intervals along a line according to their position from one end 
or according to their magnitude. We assume that on a line of unit length, (x — 1) points are 
dropped randomly giving random intervals. Let the points be dropped at distances {z;} from 
one end (i = 1, 2,...(m—1)), with 

Hy QU_g<... LHy_}. 


Write d,=%x,, d;=%,-2%_, (l<i<(n-1)), d, =1-2,_,, 


so that d; is the ith interval along the line. These {d;} can be given a rank according to their 
magnitude. Letting the smallest have rank 1 and the largest rank n we may write these 
as {9;}. The positional rank of the interval g; which is the ith largest in size we write 
as r;. The series {g;} which is merely a rearrangement of the series {d;} will be 


91 SJoS +» SIJn- 


The magnitudinal rank of d; which we do not use except for computational rearrangement 
we may write as r;. We have then that 


d,;=g9,, and g;= dy. 

To fix ideas we have given a numerical example in Table 1. (This example is not ideal in 
that there are several intervals of the same size. Further investigation into the dates of the 
consecrations would have determined the exact magnitude of the intervals, but since we 
are illustrating only, we assigned intervals of the same size a random order.) Using the 
position and size of the ith interval in position and the position and size of the ith interval 
in magnitude, four criteria suggest themselves: 

n 

(i) R= Dir. 


i=1 


n 
i=s 


(iv) G* = nd dig 
i=1 
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3. Rasa test for randomness has been discussed on several occasions, the first, we think, 
being by Hotelling & Pabst (1936) in a bivariate application. They showed that R is approxi- 
mately distributed as a Pearson Type I. Olds (1938) tabulated a related quantity for values 
of n from 2 to 10 after which the result that Rn? is normally distributed can be used. The 
exact distribution of Y is immediate. We have 


=: n—1 
Y = Did; =n- Y%.- 
i=1 i=1 


The x, are (n— 1) independent rectangular variables, and since we consider them all the fact 
that they are ordered will play no part in the distribution of their sum. We have, using the 
result of Hall (1927) and Irwin (1927), that 
1 & : , 
eS — ] tl n-1f\ [yn —4— n—2 
[(n—k) > Y>(n—k-1); k = 1,2,...,(n—1)]. 
The exact distribution is not, however, of great importance, since the mean of (n— 1) 
independent rectangular variables tends very quickly to be normally distributed with 
increasing n. We have approximately that 
Puree 
n 


is a normal variable with Fo — 
Oh ae OE ge 
which is accurate enough for n > 25. For n small we may use the fact that the percentage 


points for a rectangular sum are embodied in Barton’s (1953) tables of y?. For 


12 /n+1 ® 
a1 ee “ Y) 
is exactly distributed as ¥?. If a one-tailed test is required we may use the fact that Y has 
a symmetrical distribution. The criterion Y will obviously be sensitive to departures from 
randomness of the type where the probability is changing along the line; in fact, as we shall 
discuss in a further paper, under certain circumstances it will be the most powerful test 
which can be used. It will not be sensitive to fluctuations of the type where a long interval 
tends to be followed by a short one. 

4. If we allow s to equal unity then G is Y. It occurred to us that the discussion of a 
partial sum of intervals, such as that defined by G, might be useful in that the zero intervals 
arising from lack of refined measurement could thereby be eliminated. (This lack of refined 
measurement will lead to a bias in that it may overemphasize the larger intervals, but on 
the whole this bias does not seem to be of serious proportions.) The derivation of the 
distribution of G is theoretically possible following the algebra developed in our two earlier 
papers (1956a,b). Since the derivation is the same, consider a more general form of G say 


t 
Gy = UNG (s<t<n). 
i=s 
Under the null hypothesis 7; and g; are independent and we have 
PU s%o41 = MNIsIst1-+° 9) a PUT S41 — 11) PIsIs+1 a 91) 


s—l 


O PTT era e+) Dy (— 1) 8-101 —(F +1) Ge— Gora — «++ —H-1— (n—t +1) 9)". 


j=0 








106 Tests for randomness of points on a line 
Substitute for g,: 
rt ; n—t+l1 
PTT sree Is Iovr + $1 Gat) LP" s%s41 ++ M1) Z pets Ye 1C; i|1- ( uk 
i= 


on” 
(n —t+l1)r, 
+a4( " 


n—t+l1 n—t+s—2 
=(G+0) +--+ 904(' Tt 7 1-1) : 


Integrate out for g,_;,...,g, in turn and at any stage the inequalities 


Ga—eIs—"s419s+1 — -+*—"h-19k—1 


Gy 
>9p>9r-v ZG >g9,>90 (kK=t—1,t-2,...,8+1 
MAM yt. +0, Ie>Ie-v | > 9: ( +1) 


=r 


l=s 


will hold. The conditional distribution for G,, given a set of ranks is thus proportional to 


t 
S(-14G ¥ 
j=1 w=1 
t t—-s—1 
(-y( sn) 
f=t—w+1 
Xt=s—wp t—-w — tf  \mir t-,.., .. ert Te Te 
| y ri(n—-t+w)-l > rt | po pga gdm pS n| 
l=1 Li=t-l-—w+l1 i=t—w+l1 v=1 Li=t—w+1 i=t—v+1 
t 
x |(m—t4+-w) 1, (ts —w4 J+) > n| 
f=t—wt1 
where (n—8+j+1) n—8 (n—t+w) n—8 
heji- ; Ga). ~-[4-—aw Sa 
a" f 
f=s f=t—w+l1 


The complete distribution of G,, will be obtained by summing this conditional distribution 
over all possible permutations of the ranks {r;}. Since we have been unable to simplify the 
conditional distribution we have not pursued the complete distribution as it will obviously 
be intractable. Previous work (Barton & David, 19565) on the unweighted partial sums of 
the g’s showed that the normal approximation was fairly good for n > 20, and it might be 
anticipated that the same would be true for this present case. We show in §7 for certain 
special cases that the 7, and /,—3 are of order (n-'). 

5. The moments of G,, may be derived from those we have already given (1956 a) for the 
powers and products of {g,} together with the moments of the finite population of natural 

: ‘ . 

numbers given in Table 2. Let M(g") = 6(g,-&(g,))' 


and «(g") the corresponding cumulant. We have 
t 
é( (Gy) “a 3 (n+1 ) EKG K(9;); 


where : 1 


9 l t 2 t 
6 G4) = “| Esa +d csr] +? [x 96% 6-I+5[E5«] -FE Cr], 
where 1 


whe i (n—j+1)2" 
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In terms of the cumulants of the individual g’s this second crude moment may be written 


1)(2 t 
(n+ 23 n+ 3K K(g) + 


et! oi [Ex Cee, 


&(G2) = (n+ ets 


K(9i9x) 


i< 





Ble(gi)P. 


Values of the expected value of G, i.e. of G, when t = n, and of its standard error for different 
values of n and s are given in Table 3 below. They are multiplied by n- to facilitate com- 
parison with their asymptotic form. 

6. The higher moments of G,, can also be obtained by the same method. Thus if 


S@ Ms...Mn x S(e™) .- S(u™) (¢—1) eee (¢-—t—q+ 1), 


w=s 





n 33 1 Y y Y y Y Y 
& (G3) = ve _ $ [2S + Sq, + 2SgS, + 4SE+ 45,15, +899 + 299 
+38, SP + SP + 6S} + (nm —t) (Sq — S(t) 82)], 
with a more complicated expression for the fourth crude moment which we do not reproduce 
here. In the case in which we are particularly interested, that is, when ¢ equals n, the moments 
may be considerably simplified by expressing them as polynomials in S((s—1)“), with 


coefficients as rational functions of n and k = n—s+1. The S((s— 1)”) are just the differences 
of the appropriate polygamma function of n and k. Thus if 


1+S((s—1)!) =1+F(n)—F(k) =7, (say), 
S((s—1)?) = —F(n)+F(k) = T, (say), 

(=2 a —1 

(w—1)! 


the central moments of G are (or follow from) 


and generally S((s—1)") = apa er »} 1, 


, n+l 
* 2n 


kT, 


fs = nS [T2(n? — 4nk — 3k) + nT, (3nk + 2k +n) + 2n(2n + 1)], 


(n+1)k ae 
bs = 4n3(n +2) [T,n®k(k +1) + 7, T,nk(n? —n—4nk— 2k) + nT,(n? — 5nk— 2k) 
+ T?k(3nk +2k— n?) ae 2n3], 
k 


My = 240n(n + 2) (wi 3) 4882" 4 138n2 + 23n — 12) + 4k(20n3 + 24n? — 5n— 6) 


+ 167 [n(9n? + 15n + 4) + k(15n3 + 21n — 4)] 

+ 4(T? + Ty) [2n?(n + 2) + nk(40n2 + 69n + 23) + k2(30n3 + 35n? — 11n — 12)] 
+ (14+ 67? T, + 373 + 87, T, + 67) [ — 2n?(n + 1) + nk(5n? + 9n + 2) 

+ 2nk?(15n? + 25n + 8) + k3(15n? + 15n? — 10n — 8)]}. 
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7. We now consider the case where the ratio s/n is fixed and equal to p say. Let 
q=1-p, Q=1-logg. 
Since the Euler-Maclaurin expansion of 7), yields 


1 u! 1 
7, =Q+0('). Tus = suit - 1) +0( 5] (w> 1), 





we have for the cumulants of G 


Mi ="FQ+O), ky = T4(7-3q-+Q4(1 +4) +0(1), 


Ky = “P15 g? + 4(1 —5q + 29%) Q + 24(3q— 1) Q*] +0(1), 
k= ab [(537 — 1464q + 114092 — 24593) + 12Q(17 — 92g — 5q2 + 209°) 


+ 4Q?(5 — 271q + 9149? — 350g%) — 2Q4(1 + 4q — 16892 + 384q)] + O(1). 


These reduce to the values for the rectangular mean when q = 1 = Q. A comparison of 
(i) the actual values of nf, and n(f,— 3) calculated from the formulae of § 6 with (ii) the first 
term of their expansion as a power series in n—! using the formulae immediately above is 
given in Table 3. 

8. It is possible to obtain the moments of G using a moment-generating function of 
a certain simplicity. From our previous work, if W, denotes the weighted sum of g,, ..., 9; 
with weights r,, ...,7,, we have 


> E( Way*—1C_y* = é{(1 cm Way)} 


m=0 
k k <i 
. , tT; r; 
[ocean tceeae oan HS) (FE) 
n—t+1 n—t+27 ef Yn—s+1 ai Y n 








Hence if the {r;} are randomly drawn from the finite population of the first n natural numbers 
independently of the {g,}, if t = n, and 


S 1 
‘; = Feat oo FAG, 


we have 


&(G™) = 


(n+m—1)! <5 


mi(n—1)! * ,. 1 — 
3 Whe (5-3) 6m) 


where 
h,(;) = b> a; eee a, 
4, <i,<...<te 


is the homogeneous product sum of weight v. Let 


Eis.n = EFF; ---7y) (WS I<... <h). 


We can work out simply the corresponding h-functions which are seen to reduce to linear 


sums of the moments of the r’s whose coefficients are polynomials in k. The Z’s are given as 
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functions of i,j, .!. and the moments of the finite population in Table 4, as they are quite 
general and of some intrinsic interest. The quantities 


be -—59) 


are simply expressed in terms of the corresponding power sums (David & Kendall, V, 1955) 
which are just the {Z7,} and the moments of the finite population of the first n natural 
numbers (Table 2). The moments of G follow immediately after a little reduction. 

The asymptotic values for the mean and variance agree well enough for n = 25 for them 
to be used for n > 25 in forming the test function in place of the true moments based on the 
polygammas. Of course if tables of these last are easily available they may be used. 

9. The sum of squares of intervals along a line has been discussed by various authors, the 
first being Greenwood (1946). Approximations to its distribution have been suggested, but 
the exact distribution has not yet been obtained except in the cases n = 2,3 (Moran, 1947). 
It was therefore to be expected that the distribution of G* would be equally elusive, and 
this has in fact proved to be the case. The exact distributions of G* for n = 2,3 have been 
found, but they proved to be of much greater complexity than those found by Moran for the 
sum of squares. The moments, however, are not difficult. Straightforward algebra gives 


&(G*)=1, var (G@*) = (n?+ 7n—6)/(n +1) (n+ 2) (n +8). 
The shape constants are 


16(m + 1) (m+ 2) (n+ 3) (n? + 51n? — 133n + 60)? 


= ean ate > 
By (n—4)2 (n+ 5)? (n? + 7n—6)8 situs 
=0 (n=2), 
3(n +1) (n +2) (n+ 3) (n> + 56n* + 4639n3 — 15364n? + 17172n — 5040) 
ji = = (n>4) 


(n +4) (n+5) (n +6) (n +7) (n?+7n—6)? 
25 
9 


(n = 2). 


f, for n = 2 and £, for n = 2, 3 were derived separately from the distributions and are not 
obtainable from the general formula. £, and f, for various values of n were calculated. 
They are found to behave in similar fashion to those of Greenwood’s sum of squares; as 
n increases they diverge from the normal point and then turn back approaching normality 
very slowly. Table 5 gives a selection of these values. Approximations to the percentage 
points of these distributions can be made by using the appropriate Pearson curves with the 
first four correct moments or Johnson’s system. The percentage points actually quoted in 
Table 6 were obtained by interpolation into Table 42 of Pearson & Hartley (1954), and 
outside the range of the tables were estimated from a Johnson S,, curve. The standardized 
percentage points are also given to make interpolation easier for intermediate values of n. 

10. Earlier we remarked that the criterion Y could be expected to be sensitive to 
departures from randomness where the probability is changing monotonically along the 
unit line. This remark will also hold good for G. G*, on the other hand, like Greenwood’s 
sum of squares, may be expected to be sensitive to any type of departure from randomness 
in which the intervals become either too regular or too irregular. For interest, we have 
computed the four criteria which we mentioned in § 2 for the data in Table 1. 
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We have 

(i) R = 87770. Standardized R = 2-1350. Standardized R is approximately a unit normal 
variable, and this value of R is therefore probably significant. 

(ii) Y = 38-0693, Y’ = 0-440, Mean Y’ = 0-493, oy, = 0-03475 which makes the hypothesis 
of randomness doubtful but not impossible, the equivalent normal deviate being — 1-52. 

(iii) For G we decided, for illustrative purposes, to cut off observations of magnitudinal 
rank 13 and under, thus omitting one-fifth, approximately, of the observations. 

G = 37-657 and for 68 observations in toto it appears legitimate to use the asymptotic 
mean and variance. Thus, approximately 

Mean G = 33-272, o, = 2-436, giving an equivalent normal deviate of 1-817 which is 
possibly significant. 

(iv) Finally, G* = 1-119 which is inside the upper 5 °%, level of significance as Table 6 
shows. 

All four criteria, although clearly not independent of each other, point in the same 
direction, that of possibly rejecting the hypothesis of randomness and of assuming that the 
probability may be changing with time. Other and simpler tests which could be devised 
will, no doubt, lead to the same conclusion. Which, among the many possible tests, one 


would use in practice can only be determined after a discussion of the power with respect to 
defined alternatives. 


Table 1. Dates of consecration of Archbishops of Canterbury from Thomas a Becket 





























l l | | 
| { } | } | | 
| lr.) | lr.) | |r, 
Year | X, | d, Ww | x | Year| x, | a, |")! of | vear| x, | a, |%)| 7; 
| | a J tJ | | $j 
| | | | 
| | | | 
we! o|..| . | 1366 | 20 | | 
116 112] 1| 37 | oe | 2 | 26 | | 1716 | 554) o, | 51 | 58 
1174, 12 | | - | ©. | 1368 | 206 | = | | 1737 | 575 
| 11 | 2] 34 ; 7| 27 | 27 10 | 52 | 30 
1185 | 23 | | | 1375 | 213) | | 5. | 1747 | 585 
| 6| 8| aa]. ; 6 | 28 | 25 10 | 53 | 33 
1191 | 29 | | 1381 | 219 | 1757 | 595 
| | @1 41 OE coe | ae | OO) Oe) aE 1 | 54] 6 
1193 | 31 | | 1397 | 235 : ~, | 1758 | 596 
: 12| 5 | 36 @B- BR Be & ba 10 | 55 | 32 
1205 | 43 | “< | 1414 | 252 | 7. | 1768 | 606 | 
eve 2| 6| 12 Ging? JE BF. : 15 | 56 | 47 
| 1207 | 45 | - | 1443 | 281 | | 1783 | 621 
| 21! 7] 57 9 | 32 | 29 , | 22 | 57 | 59 
| 1228 | 66 | | | 1452 | 290 | ; 1805 | 643 
i f4+.e) 4 | 2133] 9 . | 23 | 58 | 61 
1229 | 67 | 1454 | 292 7 | °° | o* | 1828 | 666 | 
| one | 2| 9| 1D eae Ee Oe | 20 | 59 | 54 
1231 | 69| ¢ | | -. | 1486 | 324 | “7; “" | °° | 1848 | 686 | 
| 3 | 10) 15 8 ‘ | 25 | 35 | 64 14 | 60 | 40 
| 12384) 72) 13 | aa | 35 | 1521 | 349 | “5 | 96 | 19 | 1862 | 700 | “6 | gi | 90 
}1245 | 83 | 5. | 4) | 2°] 1513 | 351 | > | 36 | 1° | ises | 706 | Baa 
ee | 25 | 12 | 65 i bee ee ee ; 15 | 62 | 44 
1270 | 108 | 1533 | 371 |<, | ° | 1883 | 721 
A 3|13| 16] ,- oa, | 23 | 38 | 62 x 13 | 63 | 38 
1273 | 111 | © | on | 1556 | 394 | a | 1896 | 734 
51M i RT. bh | a el a 4 7 | 64 | 26 
1278 116 Bee 1559 | 397 : | 1903 | 741 | .. | 2. 
ia 62 oo , | 16 40, 49 | 3, | 25 | 65 | 63 
1279 | 117 15 | 16 | 43 1575 , 413 8 | 41 | 28 1928 | 766 | 14 | 66 | 41 
| Agee | Bae |. | ow | cme eee 1 eee | | — | 1942 | 780 | x 
TPs: 19 | 17 | 52 | 21 | 42 | 56 e , | 3 | 67) 14 
1313 | 151 | | 1604 | 442 | 3 | on | 1945 | 783 | 
cog th ex Se Tae | 6 | 43 | 23 2 10 | 68 | 31 
1313 | 151 | ,. | | 1610 | 448 1955 | 793 
oo 15| 19 | 46] 7... | --. | 23 | 44 | 60 
| 1328 | 166 | | | oo pee | a i ST 
ot is 5 | 20, 21 27 | 45 | 66 
1333 | 171 1660 | 498 | ~: 
| | | 15 | 21 | 45 af Tee 3 | 46 | 18 
| 1348 | 186 | a | 1663 | 501 | 
: ~ | 1|22| 56 | 15 | 47 | 42 
1349 | 187 | bag 1678 | 516 ; 
reilly 0/23] 2i,. oo | 13 | 48 | 39 
1349 | 187 | Pag 1691 | 529 | 
1366 | 204 | 17 | 24 | 5° | i695 | 533 | ,* | 49 | 19 
| | 0| 25} 3 | 21 | 50 | 55 








Note. X, = 793x,; where x, is defined in §2. 
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Table 2. The crude moments of the finite population of natural numbers 1, 2, ...,n 








, ntl ei (n+-1) (2n+1) i (n+ 1) (8n+4+ 2) 
are ° sian ’ ae 
, _n(n+1)? , _ n(n+1)? , n(n + 1)? 
a= _— fa = —— « fun = To 
, _(n+1) , _ (n+l) ’ (n+l 
ha = 35 (6n3+ 9n*?4+n—-1), pg. = oy (15n3 + 21n?—4), foo = *) (20n3 + 24n* — 5n —6), 
’ (n+ 1) , (n+1),. 
feu = 360 (30n3 + 35n? — lIn— 12), fu = gan (LonP + 15n? — 10n—8). 


Table 3. Values of w4/n, W2/n, nf, and n(f,—3) of the distribution of G for different 
n and p (= s/n) together with their limiting forms 




















n| | | | | 
5 >! 20 a ws re) 
Pp | | | 
ee ee mesh Se a ad a a A 
| | | 
piln | £ | 06000 | 0-5445 | 0-5261 0-5168 | 05113 | 0-4893 
| | 05760 | 05144 | 0-4940 0-4838 | 0-4776 | 0-4532 
| 3 | 0-5220 | 0-4525  0-4294 0-4179 | 0-4109 | 0-3833 
| ¢ | 09-4280 | 0-3458 | 0-3179 | 0-3038 | 0-2953 | 0-2609 
| | | 
| | | 
paln | 2 | 0-06667 | 007960 | 0-08033 | 008205 | 0-08309 | 0-08724 
 & | 0-07493  0-08752 | 0-09172 | 0-09383 | 0-09510 | 010022 
% | 009238 | 0-10626 | 0-11082 | 0-11309 | 0-11445 | 0-11989 
4 | 0-11316 | 012315 | 0-12570 | 0-12680 | 0-12740 | 012936 
| | rt | 
np, | $ | 0000 | 0007 | 0013 | 0016 | 0-018 0-029 
2 | 0-069 | 0-141 | 0-174 | 0-192 | 0-204 0-259 
3 | 0-289 | 0-480 | 0-571 0-624 | 0-659 0-824 
| ¢ | 0-734 1-293 | 1587 | 1-772 | 1-846 2-516 
| ] | 
nr e ss . eT " yee: ete | 
n(B,—3) } | 1-500 | —1-733 | -1-772 | -1-814 | -1-839 | —1-975 
¢ |-2470 | -2-708 | -2809 | -2-868 | -2-:958 | —3-087 
3 | 2-926 | -2-809 | —-2:816 |—-2814 | —2-814 | —2-815 
| t | —2082 | — 1-506 /—1-184 —0-978 | —0-836 | — 0-288 











Table 4. The crude moments of members of a sequence of partial means (population finite) 


E; a My 


1 : ' ae 
Ey = jluat (j-1) ful (¢<9), 

1 , es 
Eisx = 5 leat (20) - 1)+(k—1)) oat (J-1) (k-2) win) = (¢@< 7 <h), 


1 , 
Eisua = alee + (2(j — 1) + (k—1) + (0-1) far + (209-1) + (1) Hoe 


+ (3(j—1) (k—2) + 2(j—1) (1-2) + (k—1) (0-2) pau + (G-D (K-2) (0-3) enna) (¢< J <k <I). 
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Table 5. £,, 8, for the distribution of G*, sample size increasing 





| | | 


2 3 4 5 6 7 8 | > | J 20 | 40 | 100 | 400 | 1000 








ee eee ee | ee : | 


| | | | | 
| 0-96 | 1-97 | 2-66 3-08 | 3°30 3-38 | 3-36 3-29 2-06 0-90 0-26 | 0-05 | 0-02 
| 


2:78 | 4:79 | 7-17 | 9-16 | 10-66 | 11-74 | 12-46 


| 


12-92 | 13-17 | 11-54 | 7-67 | 4-51 “ie 3°09 


| 





| be 
' | | | | 
| 0 
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Table 6. Percentage points of G* 
[G* =standardized G*.] 





l | 
n= 40 n= 100 | n = 400 n=1000 | 
Percentage | __ | 

point 








| 
4a 


Upper 0-5 | 1-591 | 3-71 1-325- | 3-24 | 1142 | 284 | 
1 | 1-488 3-06 «1-278 | 2-77 1-126 251 | 1-077 2-43 


| 
25 | 1-362 | 227 | 1-217 | 217 | 1-103 | 207 | 1-064 | 2-02 
5 1-273 | 1-72 | 1172 | 1-72 | 1-085 | 169 | 1-053 | 1-68 
Lower 5 0-776 |-1-41 | 0-849 -1-51 | 0-921 |-1:58 | 0-949 |-1-61 
25 | 0-726 |-1-72 | 0819 |-1:80 | 0-906 |-1:87 | 0-940 |-1-90 
1 0-657 |-216 | 0-782 |—217 | 0-889 |-221 | 0-929 | -2-25 





0-5 0-602 | —2-50 0-755*+ | —2-44 0-878 |—2-44 | 0-921 | —2-48 


| = | | | 
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PAIRED COMPARISON DESIGNS FOR TESTING 
CONCORDANCE BETWEEN JUDGES+ 


By R. C. BOSE 


University of North Carolina and Division of Research Technique 
London School of Economics and Political Science 


1. SUMMARY AND INTRODUCTION 


In a recent paper, ‘Further contributions to the theory of paired comparisons’, M. G. 
Kendall (1955) considers paired comparison designs, in which each pair of judges have 
certain comparisons in common. Such designs should prove useful for testing concordance 
between judges. He notes that designs of an optimum kind which balance by numbers of 
comparisons, objects compared, numbers of observers on given comparisons and so forth 
are rather rare. It is the object of this paper to obtain some paired comparison designs which 
have a high degree of symmetry. These designs have been defined in §2, and certain 
inequalities between the parameters are obtained in $3. In §§4 and 5, two special classes 
of these designs have been investigated, and explicit designs for small values of n (the 
number of objects to be compared) have been given in Tables 1 and 2. The method of analysis 
would, toa certain extent, depend on what use the experimenter wants to make of the design. 
This question will be considered in a future communication. 

My thanks are due to Professor Kendall for suggesting the problem, and for helpful 
discussion during the preparation of the paper. 


2. DEFINITION OF LINKED PAIRED COMPARISON DESIGNS 


Suppose it is required to compare objects, by employing v judges. Each judge compares 
r pairs of objects (7 > 1), and in respect of each pair expresses his opinion whether he prefers 
one or the other object of the pair. In certain circumstances it may be desirable to allow the 
judge to express no preference with respect to either of the objects forming the pair. In this 
case we say that the preference is equally shared by the two objects. We shall assume that 
the pairs compared by any judge are all different. To ensure symmetry between objects and 
judges, we require that: 

(a) Among the r pairs, compared by each judge, each object appears equally often, 
say a times. 

(b) Each pair is compared by k judges, k > 1. 

(c) Given any two judges there are exactly A pairs which are compared by both judges. 

Designs satisfying these conditions may be called linked paired comparison designs. 


Clearly r= }na. (2-1) 
The number of possible pairs is b = 4n(n—1). (2-2) 
There is a certain correspondence between linked paired comparison designs and balanced 


incomplete block designs. Each judge may be considered to correspond to a treatment, and 


+ This research was jointly supported by the United States Air Force through the Office of Scientific 
Research of the Air Research and Development Command, and the Division of Research Techniques, 
London School of Economics and Political Science. 
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each pair to a block. Ifa pair is compared by a judge, then the block corresponding to the 
pair may be considered to contain the treatment corresponding to the judge. Hence if 
a linked paired comparison design of the type considered exists there must exist a corre- 
sponding balanced incomplete block design with v treatments and b = $n(n—1) blocks, 
such that each block contains k treatments, each treatment occurs in r blocks, and two given 
treatments occur together in A blocks. It follows at once that 


bk =vr, A(v—1) =r(k—-1). (2-3) 

Using (2-1) and (2-2), the first of the relations (2-3) can be written as 
va = k(n—1). (2-4) 

Also Fisher’s well-known inequality (Fisher, 1940; Bose, 1949) gives 
b>v or rok. (2-5) 
Therefore na > 2k. (2-6) 


It should be remembered that the existence of a balanced incomplete block design with, 
parameters v, b = 4n(n—1), r, k, A does not ensure the existence of a corresponding linked 
paired comparison design due to the additional restriction (a). Clearly 


r>A. (2-7) 


The case r = A is trivial, since in this case all the r pairs compared by any judge must also 
be compared by every other judge. This means that each judge compares the same pairs. 
Condition (b) then shows that there must be exactly k judges, and each judge compares 
every pair so that r = }n(n—1). 


3. SOME INEQUALITIES 


From (2-1) and (2-4) v =(z- 7 
= —1)- 


; _  r(k—1) a? at(ka +a? — 2r) 
ssiisdianinaiiie A= pe aja" 5 ae 








(3:1) 


This shows that for a given k, A decreases as r increases. Now r>A, and when r = A, we 


get from (3-1), r=2 = }a(a+1). 


This proves that r>ta(a+l), A<ta(a+1), (3:2) 
where the equality holds only in the trivial case r = A. 

Since A must be a positive integer, (3-2) shows that « = 1 must imply r=A=a=1. 
Each judge therefore compares only one pair. Hence the case « = 1 is impossible except 
in the trivial case when there are only two objects and each judge compares them. 

Again we can write (3-1) as 
_ (k=1a? | aXb—1) (+2) ) 

A= oh * 2k(2hr— kaa)" — 
It follows from (3-2) that 2kr — ka — a? > (k—1) &?. 
Hence the second term in (3-3) is positive. We thus have the inequality 


A>Hk-1) a2/k. (3-4) 
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Since k > 2, combining (3-4), (3-2) and (3-1), we have 


far®<A< ha(a+1), (3-5) 
dA a*r(2r—a—ax?) 
dk = (2kr—ka—a?)?" (3-6) 
It follows from (3-2) that dA/dk > 0. (3-7) 


Hence A is a monotonically increasing function of k, for a given r. 


4. LINKED PAIRED COMPARISON DESIGNS WITH a = 2 


The inequality (3-5) shows that when a = 2 (neglecting the trivial case r = A), we must 
have A = 2. Using (2-1), (2-2), (2-3) and (2-4), we see that all the parameters of the design 
can be expressed in terms of n, the number of objects to be compared. In fact we have 


v= }(n—1)(n—2), b= 4n(n-1), r=n, kK=n-2, A=2, a=2 (41) 
The existence of (4:1) implies the existence of the balanced incomplete design with 


parameters 
v=}(n—1)(n—2), b= 4n(n-1), r=n, k=n-2, A=2. (4-2) 


The balanced incomplete block design (4-2) is known to exist for the values n = 4, 5, 6 
and 9 (Fisher & Yates, 1938; Bose, 1939), and can in fact be derived by first writing down 
a solution of the symmetrical balanced incomplete block design, 

v=b=4n(n-1)4+1, r=k=n, A=2, (4:3) 
and then deleting one block and all the treatments in this block. This results in two treat- 
ments being deleted from each of the other 4n(n—1) blocks, since any block of (4-3) has 
exactly A = 2 treatments common with every other block. Conversely, it is known that the 
existence of (4-2) implies the existence of (4-3) (Connor, 1952; Connor & Hall, 1954). Hence 
the non-existence of (4:3) for any value of n implies the non-existence of (4-2) for the same 
values of n. Certain sufficient conditions for the non-existence of (4:3) are known (Shrik- 
hande, 1950; Chowla & Ryser, 1950). In particular, the cases n = 7, 8 and 10 are impossible. 
The design (4-2) is called the derived of (4-3). 

We shall now show that for any solution of the symmetrical balanced incomplete block 
design (4-3), we can derive a corresponding solution of the linked paired comparison design 
(4:1). The process of derivation will first be illustrated by considering the special case n = 6. 

It is known (Carmichael, 1937), that a solution for (4-3) in the special case n = 6 can be 
obtained by writing down sixteen treatments in the cells of a 4 x 4 square and then taking 
for blocks the six treatments which occur in the same row or the same column as a given 
treatment (but not including in the block this treatment itself). We shall take the sixteen 
treatments to be 0, 1, 2, 3, 4, 5, 6 and a, b,c, d, e, f, g, h, k and arrange them as shown below: 








oe ee 
| | | 
|| 
|} 4,;a|b) ce | 
present ona (4-4) 
5 d | é | f | 
EE —|——_ 
}6jg | h| ke 
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Then the 16 blocks of (4-3) for the case n = 6, are given by 





1, 3, 0, b, é, h 
4 





2, : a C, e, h | 

eae oe oe ee (#5) 
2, 6, e, b, q; k 

3, 4, a, b, f k 

S. 5, é, R. d, e 

3, 6, c J, q; h 

4, 5, 0, q; h k 

4, 6, 0, d, e if 

5 6, 0, a, b, Cc 








A solution of the derived design (4-2) for the case n = 6 is now obtainable from (4:5) 
by omitting the first block in (4-5) and the treatments 1, 2, 3, 4, 5, 6 from the remaining 
blocks. Thus the derived design is the part included within the lines in (4-5). We can make 
a (1,1) correspondence between the blocks of the derived design and the unordered pairs 
(i,j), 7,9 = 1, 2, 3, 4, 5, 6; 7+), since each of these blocks has been obtained by deleting just 
one such pair, from the corresponding block of (4-3). Thus the block 0, c, f, k corresponds to 
the pair (1, 2) and so on. Now let us identify the ten treatments 0, a, b, c,d, e, f, g, h and k of 
the derived with 10 judges, and assign to each judge the pairs corresponding to the blocks 
in which the treatment corresponding to the judge occurs. We thus get the linked paired 
comparison design no. (3) of Table 1. Since each of the treatments 0, a, b, c, d, e, f, g, h, k 
occurs in exactly r = 6 blocks of the derived, each judge is assigned exactly six pairs, and 
since any pair of treatments occurs in exactly A = 2 blocks, any two judges have just two 
pairs in common. Again, since each pair of the derived has exactly k = 4 treatments, each 
pair is compared by just 4 judges. Finally, let i be one of the six deleted treatments 1, 2, 3, 
4, 5, 6 and let x stand for one of the ten treatments of the derived. Then 7 and x occur together 
in only two blocks of the symmetrical design, for example, 1 and a occur together only in 
the blocks (1, 5, a, g, e, f) and (1, 6, a, d, h, k). Hence the object 7 occurs twice among the 
pairs compared by the judge x. This shows that each of the six objects 1, 2, 3, 4, 5, 6 occurs 
twice among the pairs compared by a judge. 





mt ua £-< ana» ae 
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The same process can be used for obtaining a solution of (4-1) for any value of n, for which 
a solution of the symmetrical balanced incomplete block design (4-3) is known. We first 
rename the treatments in such a way that the first block contains the treatments 1, 2, ...,” 
Then each of the remaining }n(n— 1) blocks contains just one of the unordered pairs (7, j), 
i,j = 1,2,...,n, i+j. The remaining }[{(m—1)(m—2)] treatments may then be identified 
with judges. If x be one of these treatments, then the judge x is assigned pairs (7,7) corre- 
sponding to those blocks in which x occurs. 

Cyclic solutions to the symmetrical balanced incomplete block designs (4-3) are known 
for the cases n = 4, 5 and 9. All blocks that can be obtained by developing a suitable initial 
block mod v. These initial blocks are given blow 


n Initial block 
4 (0, 3, 5, 6) mod 7 
5 (1, 4, 5, 9, 3) mod 11 


9 (1, 16, 34, 26, 9, 33, 10, 12,7) mod 37 


The corresponding linked paired comparison designs are the designs nos. (1), (: 
of Table 1. 


B 
ro) 
=) 
Q 


5. SOME OTHER TYPES OF LINKED PAIRED DESIGNS 


Let the number of objects n be even, say n = 2t. Then we can divide the ¢(2¢— 1) pairs into 
2t — 1 sets of ¢ pairs each, such that each object occurs exactly once among the pairs of a set. 
For example, if n = 8, we can take the objects to be 0, 1, 2, 3, 4, 5, 6 and oo. Then the seven 


— Sets Pairs 
I (1, 6), (2, 5), (3, 4), (0, 00) 
II (2,0), (3, 6), (4,5), (1, 00) 
II (3, 1), (4,9), (5, 6), (2,00) 
IV (4,2), (5, 1), (6,0), (3, 00) (5-1) 
Vv (5,3), (6,2), (0,1), (4,00) 
vi (6,4), (0,3), (1, 2), (5,00) 
VII (0,5), (1,4), (2,3), (6,00) 


In the general case, the 2¢ — 1 sets can be obtained by developing mod (2¢ — 1), the initial set 
(1,2t—2), (2,2t—3),...,(¢—1,#), (0,00) (5-2) 
the object co remaining unchanged. 

Let us now take a balanced incomplete block design with v* treatments, b* = 2¢—1 
blocks, r* replications, block size k* and in which every pair of treatments occurs together 
in the same block A* times, and make each block correspond to one set and each treatment 
correspond to one judge. We can then obtain a linked paired comparison design by assigning 
to each judge the sets of pairs corresponding to all blocks in which the treatment corre- 
sponding to the judge occurs. We obtain in this way a linked paired comparison design 
with parameters 


n=2, v=v*, b=t2-1), r=tr*, k=k*, A=trA*, a=r*. (5:3) 


For example, in the case n = 8, we may start with the balanced incomplete block design 
with parameters v*=7, b* =7, r*=3, k* =3, A*=1. 
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If the treatments are taken as a, b, c, d, e, f, g, then the blocks are 


b, d 
a 
qd, f 
e, g (5-4) 
: a 
g, 6 
a, ¢ 


If we make them correspond to the sets I-VII given by (5-1), then we get the following 


linked paired comparison design: 


Judge 


a 


b 
Cc 
d 
e 

f 
g 


Sets of pairs 
I, V, VII 
II, VI, I 
III, VII, 11 
IV, I, Il 
V, II, IV 
VI, ITI, V 
VII, IV, VI 


(5:5) 


Table 1 





(1) 


(2) 


(3) 








Parameters 


ou il 
toe we 


Rees 
Nnvnoe 
~xsres 


Rees 
lou ue i 


. 
° 


Recs 
lou ue ul 
NPe Se 
~r 2 
ou i 
war 


Design 








Judge 


“Soeaecee 


eFrae™soasergo 





Pairs assigned to a judge 


(1, 4), (1, 3), (2, 4), (2, 3) 
(1, 3), (2, 4), (1, 2), (3, 4) 
(1, 4), (1, 2), (2, 3), (3, 4) 


SPSPNY 


. 


coe bo ot 


(1, 2), (1, 3), (2, 3), (4, 5), (4, 6), (5, 6) 
(2, 3), (2, 4), (3, 4), (1, 5), (1, 6), (5, 6) 
(1, 3), (1, 4), (3, 4), (2, 5), (2, 6), (5, 6) 
(1, 4), (2, 4), (1, 2), (3, 5), (3, 6), (5, 6) 
(1, 4), (1, 6), (4, 6), (2, 3), (2, 5), (3, 5) 
(1, 3), (1, 5), (3, 5), (2, 4), (2, 6), (4, 6) 
(1, 2), (1, 5), (2, 5), (3, 4), (3, 6), (4, 6) 
(1, 4), (1, 5), (4, 5), (2, 3), (2, 6), (3, 6) 
(1, 3), (1, 6), (3, 6), (2, 4), (2, 5), (4, 5) 
(1, 2), (1, 6), (2, 6), (3, 4), (3, 5), (4, 5) 
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Table 1 (cont.) 





Linked paired designs with « =2 














oa) Design 
No. Parameters 
Judge | Pairs 
ying ao Foe OF iis 2 Ses aS Bia aor 
(4) n=9, v=28 | (5, 6), (6, 7), (7, 8), (8, 9), (1, 9), (1, 2), (2, 3), (3, 4), (4, 5) 
| b=36, r=9 | (3, 7), (6, 8), (1, 8), (5, 7), (4, 5), (2, 9), (1, 4), (2, 3), (6, 9) 
k=7, A=2 | (5, 8), (3, 6), (2, 9), (4, 7), (1, 7), (2, 6), (4, 5), (1, 9), (3, 8) 
a=2 | (7, 8), (3, 4), (2, 6), (2, 8), (6, 9), (3, 5), (1, 7), (1, 4), (5, 9) 
(1, 2), (1, 6), (3, 5), (4, 8), (3, 8), (2, 7), (6, 9), (4, 5), (7, 9) 
(1, 8), (2, 3), (2, 7), (4, 6), (5, 9), (4, 9), (8, 8), (1, 7), (5, 6) 
5655 | (2, 6), (1, 4), (8, 9), (2, 4), (5, 6), (1, 5), (7,.9), (3, 8), (8, 7) 
+S) | (4, 9), (6, 9), (4, 7), (1, 3), (5, 8), (2, 8), (3, 7), (5, 6), (1, 2) 
| (1, 5), (5, 9), (4, 8), (3, 6), (1, 2), (4, 6), (7, 8), (3, 7), (2, 9) 
| 


(5, 7), (7, 9), (4, 6), (3, 4), (1, 8), (3, 9), (1, 2), (5, 8), (2, 6) 

(4, 7), (5, 6), (3, 9), (1, 6), (2, 9), (2, 4), (1, 8), (7, 8), (3, 5) 

(4, 8), (3, 7), (2, 5), (1, 9), (8, 5), (6, 7), (2, 6), (1, 8), (4, 9) 

(4, 6), (5, 8), (6, 7), (1, 4), (2, 7), (1, 3), (3, 5), (2, 9), (8, 9) 

(3, 9), (7, 8), (1, 3), (4, 5), (4, 9), (6, 8), (2. 7), (2, 6), (1, 5) 

(2, 4), (1, 2), (6, 8), (1, 7), (8, 9), (3, 6), (4, 9), (3, 5), (5, 7) 
5) 


WRARSHLSCSSTSIABVOCS ZF TSS | HFQwHe aeeces 














(2, 5), (1, 8), (3, 6), (6, 9), (1, 5), (3, 4), (8, 9), (2, 7), (4, 7) 
(6, 7), (2, 9), (3, 4), (3, 8), (5, 7), (1, 6), (1, 5), (4, 9), (2, 8) 
(1, 3), (2, 6), (1, 6), (5, 9), (4, 7), (2, 3), (5, 7), (8, 9), (4, 8) 
| (6, 8), (3, 5), (2, 3), (7, 9), (2, 8), (1, 9), (4, 7), (1, 5), (4, 6) 
ee 9), (5, 6), (4, 8), (1, 4), (2, 8), (5, 7), (3, 9) 
(1, 6), (8, 9), (4, 5), (3, 7), (3, 9), (1, 7), (4, 6), (2, 8), (2, 5) 
(2, 3), (1, 5), (1, 7), (5, 8), (2, 4), (6, 9), (3, 9), (4, 8), (6, 7) 
(1, 9), (5, 7), (6, 9), (7, 8), (2, 5), (3, 8), (2, 4), (4, 6), (1, 3) 
(1, 4), (4, 7), (3, 8), (1, 2), (6, 7), (5, 9), (2, 5), (3, 9), (6, 8) 
(4, 5), (2, 8), (5, 9), (1, 8), (1, 3), (7, 9), (6, 7), (2, 4), (3, 6) 
(1, 7), (4, 8), (7, 9), (2, 9), (6, 8), (5, 6), (1, 3), (2, 5), (3, 4) 
(5, 9), (2, 4), (3, 7)» (2, 7), (1, 6), (5, 8), (3, 4), (6, 8), (1, 9) 
(7, 9), (2, 5), (5, 8), (4, 9), (2, 3), (7, 8), (1, 6), (3, 6), (1, 4) 











The parameters of the design are 
n=8, v=7, 6=28, r=12, k=3, A=4, a=3. 


Each judge compares 12 of the 28 possible pairs. Other designs obtained in this way are 
the designs nos. (1), (2) and (4) of Table 2. It should be noticed that the design (1) of Table 2 
is the same as the design (1) of Table 1, obtained in a different manner. 

Again let the number of objects be odd, say m = 2¢+ 1. Then we can divide the ¢(2t + 1) 
pairs into ¢ sets of (2¢+ 1) pairs each, such that each object occurs exactly twice among the 
pairs of a set. For example, ifn = 7, we can take the objects to be 0, 1, 2, 3, 4, 5, 6. Then the 
three sets are: 





Sets Pairs 

I (0, 1), (1, 2), (2,3), (3,4), (4,5), (5, 6), (6, 0) 

II (0, 2), (1,3), (2, 4), (3, 5), (4, 6), (5, 0), (6, 1 (5:6) 
Til (0, 3), (1, 4), (2, 5), (3, 6), (4,0), (5, 1), (6, 2) 
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In the general case we can take the objects to be 0, 1, ..., 24. Then the ith set consists of all 
pairs for which the difference mod (2¢+ 1) between the objects constituting the pair is 7. 

Let us now take a balanced incomplete block design with v’ treatments, b’ = t blocks, 
r’ replications, block size k’ and in which every pair of treatments occurs together in the 
same block A’ times, and make each block correspond to one set and each treatment to 
a judge. We then get a linked paired comparison design by assigning to each judge the sets 
of pairs corresponding to all blocks, in which the treatment corresponding to the judge 
occurs. We obtain in this way a linked paired design with parameters 


n= 2#+1, b=t(2t+1), r=(2t+l)r’, k=k’, A=(2W41)A, a =2r"’. (57) 
For example, in the case n = 7, we get the linked paired comparison design no. (5) of 
Table 2. It should be noted that the sets obtained in this case are the same as the tours 


round the preference polygon considered by Kendall (1955), and lead to the designs already 
considered by him. 


v=’, 


















































Table 2 
= + hi Oa ie Pirates. ME oe : eee 
Design 
No. Parameters Sets of pairs 
Sets of pairs 
Judge | assigned to a judge 
(1) n=4, v=3 I (1, 2), (0, oc) a II, It 
6=@ r=4 II (2,0), (1, 0) b III, I 
g=%, A=? III (0, 1), (2, co) c I, II 
a=2 
" Sead , | cicilieitdacindal . a 
(2) n=6, v=5 I (1, 4), (2, 3), (0, co) a Il, Il, IV, V 
b= 15, r=12 II (2, 0), (3, 4), (1, 00) b I, Il, IV, V 
| but, Awd III (3, 1), (4, 0), (2, 00) c I, I, Iv, V 
a=4 IV (4, 2), (0, 1), (3, 0c) d I, Il, III, V 
| Vv (0, 3), (1, 2), (4, 0) e I, I, II, IV 
(3) n=8 o=7 | 
ss 7 = 2 | - 
ie ‘ ha |The sets (5-1) The design (5:5) 
a=3 | 
| : 
(4) n=8, v=7 | @ ill, V, VI, VII 
b= 28, r= 16 b IV, VI, VII, I 
k=4 A=8 | c | V, VIl, I, Il 
a=4 The sets (5-1) d VI, I, I, II 
e VII, I, II, IV 
f | 41,1, Iv,v 
g | I1,IV,V, VI 
| 
| 
(5) n=7, v=3 ses | Khim 
6=$1, f= The sets (5-6) b ITI, I 
k=2, A=7 c I, II 
| ao= 4 | 
| | 
J ncciieailacaal = 
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ON THE DISTRIBUTION OF THE LARGEST OR THE SMALLEST 
ROOT OF A MATRIX IN MULTIVARIATE ANALYSIS* 


By K. C. 8. PILLAI} 
University of North Carolina and University of Travancore 


1. INTRODUCTION 


This paper deals with the distribution problem of the extreme characteristic roots of a matrix 
in multivariate analysis (Roy, 1939; Hsu, 1939; Fisher, 1939). Roy (1945, 1953, 1954) has 
discussed in detail the usefulness of the extreme characteristic roots in tests of hypotheses 
and confidence interval estimation in multivariate situations. 

For obtaining tests of three different types of hypotheses in multivariate analysis, namely, 
(i) that of equality of the dispersion matrices of two p-variate normal populations, (ii) that 
of equality of the p-dimensional mean vectors for / p-variate normal populations and 
(iii) that of independence between a p-set and a g-set (p <q) of variates in a (p+q)-variate 
normal population, in each case we arrive at the characteristic roots of a matrix based on 
sample observations. In each case, if the hypothesis to be tested is true, the non-zero roots 
(0<0,<0,<...<0,<1; 8<p) have the same joint distribution, the form of which was given 
by Roy (1939), Hsu (1939) and Fisher (1939). The distribution can be written in the form 


(Oy, .--, 9,) = C(8, m,n) IL 61-0," TI (0,-9;) (0<0,<... <0,<1), (1) 


where i>j 


ee ee II r(™* sual + *)/ I (r(™ +i+ *) r(*+*") rai, (2) 
int 2 i\ 2 2 
and where m and n have to be interpreted differently for the different situations (Pillai, 
1954, 1955). 

The problem of obtaining the cumulative distribution function (c.d.f.) of the largest root 
has been already investigated by Roy (1945), who gave explicit expressions for the c.d.f. for 
numbers of roots 2, 3 and 4. Nanda (1948) also gave such expressions for s = 2, 3, 4 and 5. 
The present author (1954) has extended these results for the c.d.f. for values of s up to 8. 
The expressions involved in these results are functions of incomplete beta functions and are 
useful for extensive tabulation of the probability integral. However, if one considers only 
the problem of obtaining the upper percentage points (5% or less), the approximations 
suggested in the following section will be useful. 


2. APPROXIMATE FORMULAE FOR OBTAINING UPPER PERCENTAGE POINTS 
5 % OR LESS) OF THE LARGEST ROOT 


Let us first consider the case of two roots. The c.d.f. of the largest root in this case involves 
only two incomplete beta functions and can be written in an explicit form (Nanda, 1948; 
Pillai, 1954) given by 
C(2,m 


Pr (0,<2) = a ee 


at 2 62r1(1 - 0, "+140, —amtl(l—a yeti or (l- 0)" dO4} . (3) 
* Part of a doctoral thesis, submitted to the Department of Statistics, “o's of North Carolina, 
Chapel Hill; work done under the sponsorship of the Ford Foundation. 
+ Now with United Nations, New York. 





trix 
has 
eses 


ely, 
that 
and 
iate 
i on 
00ts 
ven 
orm 


(1) 


(2) 
llai, 


‘oot 
for 
d 5. 
0 8. 
are 
mly 
ons 





K. C. S. Prat 123 


For integral values of m, integration by parts of the two integrals in (3) will reduce the 
probability integral in (3) to the form 








s 2m—i+1 (| — g)2n+i+2 
Pr(Qy<a) = Som") (m), Pn + 1) 22-44 (1-2)? 
i=0 


(m+n +2)\< T(n+i+2) 


2" (2m+ 1), I'(2n +2) gim—itl(] — a)r2ntit2 


imo (Qn 4+t43) 








20(2m+2)P(2n+2) P(m+1)T(n+1) 


T(2m + 2n +4) T(m+n+2) — 





mth] — -zpl, (4) 


where (t), = t((t—1)...(t-—r+1). 
In (4) if we neglect terms of order higher than zm, (1—a)"+1, we get 
_ O(2,m, m) (20 (2m +2) T(2n+2)_ Tm+ VL M+1) rar ial 
Pr(4,<2) = Ceant2 nA Tiam+2n+4)  Temta+a) ° 7 . (5) 
For small values of m, (5) will give the percentage points for the upper 5 % level or less. 
Using the fact that Pr (0, <1) = 1, we obtain (5) in the simpler form 
D(m +1) P(n+1)P(2m + 2n+ 4) 


— 0 (m+n+2)T(Qm+2)T(an+2)" 1-2)" (8) 


If the probability in (6) has to be 0-95, then 


T(m+ 1) P(n +1) 0 (2m+2n+4) Se 
20(m+n+2)T(2m+4+2 )T(2n +2)” +1(1 —a)"+1 = 0-05. (7) 


The expression in (7) is very simple for computation, especially when the values of m are 
small. The error involved in using this approximation has been computed and difference 
between the exact and the approximate probabilities has been found to occur in the fifth 
decimal place, and, on rounding off, there could be at the most a difference of 1 in the 
fourth place. 

Proceeding on the same lines as above, i.e. as for the approximation in (6), we get the 
following approximations for s = 3, 4 and 5: 


r 3 
Pr (6,<2) = roe, 038(1— 04)" dB, (+1) (2m-+3) 


— | [an4(1 — 05)" a0 d0,\(m+1)( ) (2m + 2n-+5)—am'2(1 — 2)" (m+ 2045), (8) 
\Jo 


| 


P(2m + 2n + 8) P(m-+ 2) P(n +2) 


Pr (0,<x) = 1- 21(m+n+ 3) T(2m+ 4) T(2n+4) 


gmt a | ps az) 


D(m+1)P(n -. 2) ) P(2m +2n+8) |. um+2() ie. x)nti 


41 (2m + 2) T(2n +4) T(m+n+4) 


_ (m+2)P(m+1) (n+ 2) P(2m + 2n+7) 
~ 47'(2m + 2) T(2n + 4) P(m+n+4) 





ger) —x)r*1, (9) 
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fs _ T(m+n+4) oe 9\ /¢ ~ m m n 
Pr(9,< x) = 30 (m+1)T (n+ 3)|(2m™ + 9) (2m +5)" O31 —0,)" dd; 
2 
_, (2m + 5) (2m + 2n +7) (2m? + 2mn + 13m + 12) “OB*(1 05)" dB 
(27 + 3) 
(2m 9). 2n27) (2m 20 9) Og*2(1—0,)" dB, 
(2n +3) 


_(m+3) ) (2m + 5) (2m + 2n + 5) (2m + 2n +7) xm+2(] — ayntd 


(2n + 3) (m+ 1) 


43 (2m + 2n eee + 9) gm+3(] — g)ntl 
_ (2m+2n+7) (2m+ 2n+9)(m+n+ 4) ym n+ 
panes +4(] — ntl (10) 


The incomplete beta functions in (8) and (10) can be further integrated by parts, and for 
small values of m the expressions will become simpler and could be readily used for 
computation. 


3. PERCENTAGE POINTS FOR THE LARGEST ROOT 


An important problem in multivariate analysis is to test the hypothesis that / p-variate 
normal populations having the same variance-covariance matrix, from each of which a 
sample is drawn, have the same mean for each variate. This, in fact, is the second one among 
the three tests discussed in the introduction of the paper. For this test it turns out that 


m = — and n= od (11) 





For p = 2 we have m = }(/—4), and the expressions for the upper percentage points given 
in §2 which are good for small values of m become quite useful, because the number of 
samples, /, cannot be too large. For example, the case of m = 4, i.e. 1 = 12, relates to 
a problem with 12 samples. Using (7) tables of upper 5 and 1 % points have been computed 
for two roots for values of m = 0, 1, 2,3 and 4and n = 5, 10, 15, 20, 25, 30, 40, 60, 80, 100, 
130, 160, 200, 300, 500 and 1000. Significance levels for fractional values of m and inter- 
mediate values of n can be obtained by interpolation. The percentage points based on the 
approximate formula (7) are given in Table 1. 

In order to form an idea of the error of approximation, the probability based on the exact 
formulae is calculated at the percentage points for the parameters m = 2 and n = 10, 
30 and 80 and m = 4, n = 5 and 100, picked out from Table 1 (based on the approximate 
formula (7)). The difference between the exact and approximate probabilities is exhibited 
in Table 2. It is to be noted that within the range of values of the parameters considered, the 
approximation is quite good. 

Nanda (1951) has given significance levels (upper 5 and 1 % for the largest root for s = 2 
and very small values of m and n (m = 0($) 2; n = $($) 10). His table has been computed by 
using the T'ables of the Incomplete Beta Function (1934) for some values of m and n, the other 
values being obtained by interpolation. The significance levels are given only up to two 
decimals, and the range of m and n considered is very small. In Table 1, however, the range 
for n is from 5 to 1000 and for m is 0(1) 4. The significance levels have been given correct to 
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Table 1. Percentage points of the largest root for s = 2 
(a) Upper 5 % points 
































| | 
i. | | | 
0 1 | 2 | 3 4 
n | | 
| | 
5 0-565 0-651 | 0706 | 0-746 | 0-776 
10 0-374 0-455 | 0-514 0-561 0-598 
15 0-278 0-348 0-402 | 0-445 | 0-483 | 
20 0-222 0-281 0-329 | 0-369 | 0-403 
25 0-183 | 0-236 0-278 0-314 | 0-346 
30 0-157 | 0-203 0-241 0-274 0-303 
40 0-121 | 0-158 0-190 | 0-218 | 0-243 | 
60 00836 | 0-110 0-133 0154 | 0-173 
80 00638 | 0:0846 o-1027 | o-ll91 | 0-1345 
100 00515 | 0:0686 0-0835 | 00972 | 0-1100 
130 00400 | —(0-0535 0-0652 00761 0-0864 | 
160 0-0327 | 0:0437 0-0535 00626 0-0711 
200 0-0263 0-0352 0-0432 00506 0-0576 
300 0-0176 0-0237 0-0291 0-0342 0-0390 | 
500 0-0106 0-0143 0-0176 00207 0-0237 | 
1000 0-00535 0-00719 0-00888 | 0:01045 | 0-01195 
| l | 
(b) Upper 1 % points 
: | 
ms 
: 0 1 2 | 3 4 
és | 
bie. He ban tat , 
5 0-675 0-745 | 0-787 0-817 0-839 
10 0-470 — 0-544 0-597 | 0638 0-670 
15 0-357 425 0-476 0-517 0-551 
20 0-288 0-347 0-394 | 0-433 0-467 | 
25 0-240 | 0-293 0-336 0-372 0-403 | 
30 0-207 0-254 0-293 | 0-326 0-355 | 
40 0-161 0-200 | 0-232 | 0-261 0-286 
60 0-1114 | 0-140 0-165 0186 0-206 | 
80 0-0852 0-1080 0-1273 0-1448 0-1609 
100 0-0692 0-0878 010388 | 01184 0-1319 | 
130 0-0539 =| «= 00-0685 0-0813 | — 0-0930 0-1039 
160 0-0441 | 0-0562 | 0-0668 | 0-0765 0-0857 | 
200 0-0355 | 0-0453 00540 0-0619 0-0694 
300 00239 | 0-0305 00365 | 0-0419 0-0471 | 
500 0-0144 0-0185 00221 | 0-0255 0-0287 | 
1000 000725 | 000930 OO1114 |  0-01285 0-01448 
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three significant figures, in order to ensure sufficient accuracy for interpolation. For testing 
the equality of variate means from different populations, Table 1 should be enough up to 
m = 4, but if anything beyond this is needed one can obtain the significance level easily from 


the expression in (6) for small values of m. 


Table 2. The error of approximation at the 5 % level (i.e. difference in value 
between (3) and (6) for s = 2 at the 5 % significance level in Table 1) 

















m = 2 difference | | m = 4 difference 
n (approximate—exact) | n (approximate — exact) 
| 
| 
10 0-00002693 5 0-00002574 
30 0-00002697 | 100 0-00003477 
80 0-00002757 | _— —_— 





4. THE DISTRIBUTIONS OF THE LARGEST AND SMALLEST ROOTS FOR 
8 = 2 AS HYPERGEOMETRIC SERIES 


The joint-probability density function of 6, and @, is given by 


pA, 02) = C(2,m,n) (0,0, = 0,)" (1 — 0)” (8, — 4) (0<0,<0,< 1), 
where 


C(2,m,n) = T'(2m + 2n + 5)/{2?T' (2m + 2) T(2n + 2)}. 
Now consider the transformation 
A=06,/0,, 05 = Oo. 
Then p(A, Az) = C(2, m, n) A3"*2(1 — A)" (1 —AD,)”" A™(1—A), 


(12) 


where the range of variation of A as well as 0, is 0-1. Now integrating out A from p(A, 0), 


1 
p(0o) = C(2, m,n) 63"+2(1 — 0,)” | (1—AO,)" A"™(1 —A) dA, 
0 


CG.m,9) 


or P(92) = (m +1) (m+2) 


where 


63"+2(1 —0,)" F(m+1, —n,m+3, Os), 


- _ 1 MmtVy , m(n—1)(m+1)(m+2) p, 
Went, O95.) = 1~ ag * eee 


Again starting with (12) and effecting the following transformation 


(l-0,)u=1-0, and 6,=4,, 


we get 1 
S P(A) = C(2,m, n) O71 — a,yenee| wn(l —p) {1 —p(1 —9,)}" dy, 


or O(2,m, 
P(9;) = mati —6,)"+* F(n+1, —m,n+3, 1-6). 


(15) 


(16) 


(17) 


(18) 


(19) 


(20) 


It may be observed that a change from 1 —6, to 0, and m to n in (20) gives (16). Hence 


Pr (0,<2) = Pr(0,<2; m,n) = Pr(1—0,<2; n,m) = Pr(l—x<9,; n,m) 


= 1—Pr(0,<1-—2; n,m). 


(21) 
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Hence lower percentage points for the smallest root can be obtained from the upper per- 
centage points of the largest root when the parameters m and n are interchanged. 

The result (21) is true for any value of s as Nanda (1948) has shown, starting from dis- 
tribution (1). Hence the expressions for the c.d.f. of the largest root given in § 2 can be used 
to obtain the lower percentage points (5 % or less) of the smallest root by noting the relation 
(21) and substituting s for 2. 


5. SUMMARY 


An approximation to the cumulative distribution function of the largest (smallest) root of 
a matrix in multivariate analysis for obtaining the upper (5 % or less) percentage points for 
the largest root (lower (5 % or less) percentage points of the smallest) is given for any number 
of roots up to 5. Percentage points for two roots for small integral values of m have also been 
presented. The distributions of the largest and smallest roots are also developed as hyper- 
geometric series in the case of s = 2. 


I wish to acknowledge my indebtedness to Prof. 8. N. Roy for his kind advice and 
criticism in the course of my investigations. I am also indebted to Prof. H. Hotelling for 
his active interest and valuable suggestions in the preparation of this paper. 
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TESTS OF SIGNIFICANCE FOR THE LATENT ROOTS OF 
COVARIANCE AND CORRELATION MATRICES 


By D. N. LAWLEY 
Mathematical Institute, Edinburgh 


1. STATEMENT OF PROBLEM 


In a recent note Bartlett (1954) has summarized various approximate y? tests (see also other 
papers referred to therein). Our object in this paper is to extend certain of these, listed as 
Ill a,b,c, which may be regarded as tests involving the latent roots of sample covariance 
and correlation matrices. We shall be interested in those cases where the effects of the 
k largest latent roots have been removed and where a hypothesis of equality of the remaining 
roots is made. Such hypotheses are made in principal components and factor analyses. 


2. RESIDUAL LATENT ROOTS OF A COVARIANCE MATRIX WHEN THE 
TRUE VALUE A IS KNOWN 


Suppose that x,,%2,...,z, are p variates following a multivariate normal distribution with 
true covariance matrix Ay = [«,;]. Assuming that the first k latent roots A,, Ag, ..., A; of Ao, 
in order of magnitude, are distinct, we make the hypothesis that the residual latent roots 
Ajesas Aksg: «++, Ap are equal to some value A. We shall consider first the case where A is known. 

Let A = [a,;] be a sample covariance matrix with n degrees of freedom (corresponding as 
arule toa sample of size n + 1). Let 1; (i = 1, 2, ..., p) be the latent roots in order of magnitude 
of A. It is assumed that n is sufficiently large for us to be able to set up an almost certain 
correspondence between the first k of the 1; and the first k of the A;. More precisely, we 
require that sampling errors, which are of order 1/,/n, should be small compared with the 
differences between successive quantities A,,A,,...,A;,,A. The degree of uncertainty in 
establishing the correspondence between the first k of the 1; and the first k of the A;, though 
difficult to evaluate, appears to decrease very rapidly as n increases and is in general 
unlikely to affect appreciably the approximations developed in the paper. 

To test the above hypothesis we use as an approximate yx”, with }(p—k)(p—k+1) 
degrees of freedom, the expression 

_ log, psa bese de L,|AP-*) 7 (Ujsa hal Lio Feet L,)/A — (p a k), (1) 
multiplied by an appropriate factor which we have to determine. (It is easy to verify that 
for k = 0, this is equivalent to Bartlett’s test if we put V, = AJ in his expression of IIIa.) 
The multiplying factor will be determined by finding the expectation of expression (1), 
since this gives a better approximation than the cruder value n. 

To simplify the problem we may, without loss of generality, assume that all true correla- 
tions are zero, i.e. that A, is diagonal. This is justifiable, since any orthogonal trensformation 
of the variates x; leaves the latent roots of the covariance matrix unaltered. It will be 
convenient to partition A, and A in such a way as to divide the first k rows and columns 
from the last (~—) rows and columns. We may then write 

A 0 Ay, Aj: 
4-[o ah 4-[a azh 
where A is the diagonal matrix with elements A,, Ag, ..., A,.. 
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We shall use the symbol 6 to denote deviations of sample values from expectation so that, 
for example, dA = A—Ap, da,; = a,;;—a,;, dl; =1;—A,. All such quantities are of order 
1/,/n, and in what follows we shall be able to neglect terms of sufficiently high degree in 
the 6’s. Partitioning 6A in the same manner as A, we may write 


A,, = A+06A,,, 
Agy = AI + bAgp, 
Aj, = 0Ajp. 


If we are prepared to neglect the squares of the elements of 6A,,, the residual latent roots 
of A could be taken as being approximately equal to the latent roots of A... This would not, 
however, give us sufficient accuracy, since we wish to evaluate the expectation of expression 
(1) as far as terms in 1/n?. To accomplish this we define a matrix B having the same latent 


roots as A and given by B=(I+E)AUI+E)> 








= »_[ Buy ry 
where E=-E ae 0 |? 
Ey, = (A—AI)16Ayjp, 
4 0 day. day, | 
Ay—-Ag AGAR 
0ay5 by}, 
- Ag—Ay , A,— A, 
bay, Say, 0 
Ay—Ar An—As 7 








We now write B= (I+)(A,+6A) (I+) 
= Ay+{(HA,—A,H+6A)+ HSA} (I-E+ E?-...), 
E,,A-AE,,+6A,, 0 


and that £,, A—AE,,+6A,, is a diagonal matrix with elements 6a,,, dao, ..., da,;,. Hence, 
partitioning B in the usual manner, we have 


ES * Ps 


? 
21 Bo» 


and we note that HA,—A,H+6A = 


where the elements of B,,, B,, and the non-diagonal elements of B,, are all of order 62. From 
this it follows that with errors of order 64 the residual roots /;.,,, l;,,9, ..., |, are equal to the 
latent roots of 


Boy = Ago—8Ag,(A —AI)* 9A. — 8 Ag95Aa,(A—AL)* 6 Ajs 
+6A,,(A—AI)146A,,(A —AI)1 6A, + O(64). (2) 
It also follows that, to the same degree of accuracy, |,(r = 1, 2,...,&) is equal to the rth 


diagonal element of B,,, which is 


, (da, :)? ; ( da i 
A,+6a,,+ >’ —~— —6a,, mt 
PA-a ERA 


r i 


6a,; 6a, ;5;, 


2 
ald , indies toe O 64 : 3 
) +2 x ‘eo (A,—A,) i ( ) ( 





where >’ indicates that the summation is to be over all values from 1 to p except r. 
9 Biom. 43 
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It is easy to see that errors of order 6* in the latent roots l,.,,,1,,9, ..., 1, will give rise to an 
error only of order 6° in expression (1), so that in it we may replace the product of the latent 
roots by the determinant of B,, and the sum by the trace of B,,. In the ensuing algebra it will 
be convenient to put A = | and to replace A, by A,/A at the conclusion. The expression (1) 
then becomes 


— log, | Byp | + tr (Bye) — 4, 
where q = p—k. 
Now, writing B,. as Ay.—Z, where Z is given by (2), we have 
| Boz| = | As2—4| 
= | A,.||2—Ags'Z | 
= | Ace | | 1— {2-8 Agg + (8Ag9)?—...} Z|. 
From this it follows that 
log, | Bap | = log, | Ags | — tr (Z) + tr {(5Ag9) Z} — tr {(5Ag9)? Z} — 4 tr (Z?) + O(6*). 
We also have tr (By.) = tr (Age) — tr (Z). 
Hence the expression (1) may finally be written as 
[ —log, | Age | + tr (Age) — 9] — tr {(5A gq) Z} + tr {(8A ge)? Z} + 4 tr (Z*) + O(89). (4) 


The part of (4) within the square brackets is merely what would arise if we were testing 
the equality of all the latent roots of the covariance matrix A,,, of order g. Using Bartlett’s 
result (IIIa) we can evaluate its expectation as 


3q(q + Dat ae (20+ 1-—)}+0(.5). 


To find the expectations of the remaining terms of (4), and of other quantities occurring 
later, we need the results exemplified as follows, where quantities of order 1/n* are neglected: 


E{(day,)*} = 8Aj/n?, E{8a,,(6ay9)"}_ = 2AZA,/n?, 
E{Say260,38a93} = A,A,A;/n?, EX{(da,,)*} = 12Aj/n’, 
E{(8a41)? (Bdg9)?} = 4AZA3/n?, E{(5a,,)? (8a42)"} = 2AZA,/n?, 
BX(8a41)? (8ay5)?} = 2AZAQAs/n®, —-E{ (ba4y)*} = BAjAR/n’, 


E{(Say2) (8a3)?} = AZA,A;/n?, E{(Say)? (Sa34)?} = AyAgAgAq/n*. 


Terms of other kinds which arise all have expectations of order 1/n? or less. 


By making use of the above results, substituting for Z from (2) and neglecting terms of 
order 1/n* we find that 


= 45 A, : _ Ay 
Eltr {(8Ay2) 23] a3-1* 20-1 


= q(¢+1)2,/n?, 
E{tr {(6 Ags)? Z}] = 9(q + 1) E,/n?, 
E{tr (Z*)] = Ef tr {8A q\(A — 1)* 3A 45}*) 
= {q2i+9(q+ 1) 2,}/n?, 


where 2, = > (<4): x, = > ( A, )’ 


r=1\“A, — r=1 A,-1 
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Hence the expectation of expression (4) is 


wee eh 2 qx oy ys 
Jaa+ 1) {+ ga(24+1- 5 5)) +54 + +) 2Z,-2y)} 


: oe AOE el eT) Ry ae ar 
“wey [nt ana (22+ wri aes Zant Fil 
This means that, for general A, the correct multiplying factor for (1) is 
1 2 | ee Oe \"? ut 1 
ih a eee i ee. 5 nee oe 
nk (2+1- Fa) wal (aca)} Rama 


If practical use were to be made of this result, estimated values J, would usually have to 
be substituted for the true latent roots. If A,, As, ..., A;, are all large compared with A then 











A2x ane ee is small, & ~ is approximately equal to k and the multiplying factor 
(A, -- A)? A,-A 
becomes approximately l 2 ) ke 
~b— ie 8h ; 
s~t a( iid qt+i! qt+l 


3. THE CASE WHERE A IS UNKNOWN 


Now consider the case where the hypothesis is still that the smallest p—k latent roots of 
Ay are equal but where the common value A is unknown. The approximate x? used for 
testing the hypothesis now has }(p—k—1)(p—k+2) degrees of freedom, one less than 
before, and may be expressed as 


K 


— log. (Lisalj42 --- 1) + (p—k) loge (heir t+ tere t ++ + bp) (p —*)}, (5) 
multiplied by a factor, which when k = 0 has been shown by Bartlett (his case IIIc) to be 
2 


1/ 2 
pnhiives bale 
n g (2? + +7) 


With the same notation as before and to the same order of approximation (5) may be 
replaced by 1 l 
— log, | Bap | + log, F tr (Bsz)}. 
Since, putting A = 1, 


1 ee 
lo ot Bs |- lo 2\zt Aad 
q tog, q 22/ — 7 10g q 22 
1 ( lira) (14 1trad Z 
= qlo —- —trdA,, 
q 108. q ( q aa) 


| (tr Zz) +0(85), 


2q 


it : 7 
=—trZ+ 7 (tr dA,,) (tr Z) — 7 (tr dA,,)? (tr Z) 
the expression (5) may be written as 
| —og, | Ase| +aog, [tr (Aya) | + [—tr f(g) Z}-+ tr (Bag)? 2} +4 tr (2) 


+|7 (tr 8A ye) (tr Z)~ (te By) (tr Z) - jp (tr Z| +0085). (6) 
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The terms within the first pair of square brackets are those which would arise if we were 
testing the equality of all q latent roots of A,., the common value being unknown. Their 
expectation is therefore 


3(q—1)( +2), + jya(24+ 1 +7)\+0(3). 


The expectation of the terms in the second pair of square brackets has already been found 
to be 





oa Ett (q+ 1) (22, —Z,)}, 


where =, and &, are as defined in the preceding section. 
We now find that 


BU(tr 84g.) (te Z)] = “4 3 Bits Sean 


= 2qX,/n?, 
Bl(tr 6Agg)? (tr Z)] = 2q3,/n?, 
Bl(tr Z)*] = Efftr 8g,(A—1)16.Ay,}*] 
= (q7X? + 2qd,)/n? 
Hence the expectation of the terms within the third pair of square brackets in (6) is 


(X_— 22, — $9X})/n?, 











and the expectation of the whole expression (6) is 
1 1 
Ha 1) (q+2) {7+ 5 5(20+ 1+) + 4-1) (a+ 2) 2B,—B)/o? 


1 1 2) 1 2 a0 
=4a-00042)[ + G11 2) +5, ¢- 3 aaa 


Thus, for general A, the correct multiplying factor for (5) is 


1 2 1 
n—k—5 (24+ 1+ -) 18D Ga re Vs 

In order to make practical use of this, estimates would have to be substituted for the A,, 
and the now unknown A would have to be estimated as (1;,,, +],49+-.. +l,)/q. T£Aq, Ag, ---s Ay 
are all large compared with A, the last term could be omitted and the approximate value of 
the multiplying factor obtained from that for k = 0 by substituting n—k for n and p—k 
for p, as has been suggested by Bartlett. 

Strictly speaking what we have shown in this and in the preceding section is that the use 
of a certain multiplying factor will give the correct expectation for x”, with an error of order 
1/n?. By similar methods it is possible to show that in each case the variance of y* is correctly 
given, to the same order of approximation; but the algebra is rather laborious and is for this 
reason omitted. We have not attempted to investigate the higher moments, though it seems 
quite likely that these also will have an error only of order 1/n?. Even so there is a reasonable 
amount of justification for the use of the above multiplying factors. 

In correspondence Prof. M.S. Bartlett has pointed out to me a distinction which is worth 
making between the cases of A known and A unknown. Suppose that in a factor analysis 
A represents that part of the variance of each variate which is due to ‘error’. If A is known, 
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then acceptance of the hypothesis that the residual latent roots are equal to A implies that 
all residual variance is the result of error. If, on the other hand, A is unknown, all that we 
can infer from the equality of the residual roots is that the residual variance is the result of 
specific factors and error; we cannot distinguish between these two sources of variation. 


4, PARAMETERS OF THE DISTRIBUTION OF THE FIRST k LATENT ROOTS 


It is perhaps of some interest to consider briefly the expression (3) for the rth latent root 
L.(r = 1,2,...,&) of A. Taking the expectation we have 


K(l,) =aA,+—2 aE (py) (cs i) 





where >’ is as previously defined. Similarly for the variance of |, we have 


Ua (aaa) |+Fe) 


and the third and fourth cumulants of I, are given by 


K3(1,) = = +0(-5): 





It may also be noted that the covariance between I, and 1, (r +s) is 


2/a,a,\?_ (1 
rs +0(;3). 


A better estimate of A, than J, having a bias only of order 1/n?, is provided by 


A kan 
wobec bye (7)- = 
der, f -7 > 1 nm 1,—AJ’ 
where in the summation 7 does not take the value r and where an estimate is substituted 
for A if necessary. The variance of this is found to be 


222 CA s( ; \?\ 1 

“ar +52 A ) | +0(-5): 
This gives an idea of how near the distribution of i is to that of a variance estimate with 
n degrees of freedom. 





5. RESIDUAL LATENT ROOTS OF A CORRELATION MATRIX 


We consider, finally, the more difficult problem concerning the latent roots of a correlation 
matrix. To simplify matters we may, without loss of generality, assume that the true 
variances of the p variates x; are all unity, so that the true correlation matrix is equal to the 
covariance matrix A, (not now diagonal). Assuming that the first / of the latent roots A; of 
this matrix are distinct, we wish to test the hypothesis that the residual p — k latent roots 
are equal, the common value A being, however, unknown. 
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Let R be the sample correlation matrix and let J; (i = 1,2,...,p) represent the latent 
roots, in order of magnitude, of R. Then the criterion used for testing the above hypothesis 
(corresponding to Bartlett’s case IIIc) may be written as 


m[ —loge (Usa U p42 --- lp) + (p—k) loge {Lisa t+ lee + --- +1p)/(p— b)}I, (7) 
where we have taken the multiplying factor simply as n, since iis is a accurate 
for our purpose. 

The difficulty here is that the criterion does not, even asymptotically, follow a y? dis- 
tribution, though it will approximately do so if A,, A, ...,A, are large and A small. Even 
then, as Bartlett has remarked, the effective number of degrees of freedom depends on the 
amount of variance removed from each variate by the first k components. Our object is to 
determine the effective number of degrees of freedom in the general case, in other words to 
find the expectation of (7). We shall content ourselves with a lower order of approximation 
than before and shall neglect terms of order 1/n or less in this expectation. 

Let V and dV be the diagonal matrices whose ith elements are respectively a,; and da;;. 


—_ V =1+60, 
and R= V-4+AV-4. 
Let Q = [Q,:Q,.] be an orthogonal matrix whose first k columns, Q,, represent the column 
vectors corresponding, in correct order, to the latent roots A,, Ag, ..., Aj, of Ay. This means that 
A 0 
_— 5 ul 


where A is as previously defined. 
Now the latent roots of R are the same as those of A V-—!, or of 


Q'AV—Q = Q’(A, + 6A) (I+ 6V)“*Q 
= QA, Q + Q'6AQ — Q'A,dVQ + O(8"), 
a matrix whose non-diagonal elements are all of order 6. Hence if we neglect the squares of 


these, the residual latent roots l,,,,,;,,2, ...,1, may be equated to the latent roots of the 
sub-matrix consisting of the last »—k rows and columns, which is 


H = AI +Q3(6A —AdV) Q, + O(8?). 
The substitution of the determinant and trace of H for the product and sum of the residual 


latent roots in (7) will involve an error only of order 6%, so to this order of approximation the 
expression (7) may be written as 


| ile ) 
a | log, | H | +(p k) log. - , eA rf 
This is equivalent to 


aya (i054 — ABV) Q,}*— + (tr QA ABV) Q,}* +018] 


tr {0(8A —ASV)}2— trO(bA —AB +0009), (8) 


= mal Crk 


where C = Q,Q; = I-Q,Q}. 


In order to evaluate the expectation of (8) we make use of the well-known result 


NE (SA yp, 80; ;) = Oy; Oty; + Uys Kyi, 
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where, as before, «,; denotes the true covariance (or correlation) between x; and x;. We then 
pam Sa nEitr (C8A)%} = (trCA,)? + tr (C-A,)? 

= A°(trC)?+A*trC 

= A(p—k)(p—k+), 
nE{(tr COA)?} = 2tr (CA,)* 

= 2A*(p—k), 


nEX{tr (COACSV)} = 2ae Sct 


ti? 


nE{(tr CdA) (tr CdV)} = a Sef 


ti? 


nEftr (COV)?} = 2 1 Zetja 


ij a5, 
nE{(tr CdV)?} = 2 EEeue 
where c,; is the typical element of C. 


Using these results and neglecting quantities of order 1/n, we find the expectation of 
(8) to be 


1 
Ret ~ ie - 5+ 8) - ee he ee le (p -k)Sd(e ij mith a 1 (55055 %3;)}- 
a ij 
(9) 
This expression represents the effective number of degrees of freedom for the approximate x”. 
In practice the quantities A, c,; and the correlation coefficients ~;; would usually have to be 
replaced by sample estimates. 
In the limiting case where all correlations tend to unity and A-0, the number of 


degrees of freedom becomes 4(p—k—1)(p—k+2). Furthermore, expression (8) is then 
asymptotically equivalent to 


ui e5j ais, 


Halt l@eaQns— > (tr (Q04Q)?, 


which is of the correct y? form. This is so because the }(p—k) (p—k+1) distinct elements 
of the matrix Q3dAQ, are asymptotically normally and independently distributed, the 
diagonal elements having variance 2A2/n and the non-diagonal elements having variance A?/n. 
In the particular case considered by Bartlett (1951) all true correlations are equal to 
p and the p—1 residual roots of R are to be tested for equality after the effect of the first 
has been removed. Here the first true latent root, A,, is 1+(p—1)p, the residual roots are 
oT 
VP 
A= 1-p, a = 1, %; = p, Cy; = (p— 1)/p, ¢,; = —1/p(t+j) we have 


AX (ch) - 2" 


all 1 —p and the latent column vector Q, corresponding to A, is , 1,..., 1}. Hence putting 


—Pp)s 





¥ (c2.02,) = (Pa! 
UE (C7;H%)) > 





_1)2 
x “ (C4 Cj;%3;) = — ») {1+(p—1)p*}, 
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and the expression (9) becomes 


}(p—2)(p+1)— 2 1p 


This agrees with the value found by Bartlett. 


I am indebted to Prof. T. W. Anderson for showing me the draft of a paper by him on 
‘Asymptotic theory for principal component analysis’, which is to be published at a later 
date. In it he suggests that a better approximation to the distribution of the criterion 
considered in this section, would be obtained by using cy3, where c and d are constants and 
where x3 denotes a y? variate with d degrees of freedom. In order to fit the constants 
c and d it would, however, be necessary to find the variance as well as the expectation of (7), 
and this would clearly be an unpleasantly complicated expression. 


Note added in proof. The conjecture that use of the correct multiplying factors for 
the criteria of §§2 and 3 would make all moments agree with those of x?, neglecting 
quantities of order 1/n?, is now known to be correct. This follows from a general result, 
to be published later, concerning likelihood ratio criteria. 
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THE VARIANCE OF THE MEAN OF SYSTEMATIC SAMPLES 


By R. M. WILLIAMS 
Applied Mathematics Laboratory, D.S.I.R., New Zealand 


1. INTRODUCTION 


Systematic sampling methods have been used in a variety of fields, notably in forestry and 
ecological studies, because of their convenience in the field and their advantages over 
a randomized scheme if the survey is also to be used for mapping. Their use was justified 
by work such as that of Hasel (1938) and Osborne (1942), who examined the sampling errors 
of random, stratified random and systematic surveys, by subjecting detailed forestry 
survey data to analysis by these three methods. These analyses showed that the systematic 
designs were generally the most efficient. They left unsolved the problem of estimating the 
error of the systematic sample, although Osborne’s paper gave a method based on computing 
the meaii square serial correlation which gave reasonable agreement with experimental 
results. 

More recently, Finney (1948), also using data from forest surveys, has compared the 
variances of systematic samples (computed by the overlapping group method to obtain 
reasonable precision), random samples and stratified random samples with one and two 
members per stratum. In these cases Finney showed that the variance of the systematic 
sample differed little from the variance of a stratified random sample of the same size with 
one observation per stratum, but was appreciably smaller than the variance obtained with 
half the number of strata and two observations per stratum, which appeared to be the most 
efficient system giving an unbiased estimate of the variance without supplementary 
information. Further investigation of the variance of systematic sampling with various 
models seems likely to be useful. 

Yates (1948) examined the sampling variance for a number of particular cases and gave 
a form (p. 362) which, with supplementary information, provided an estimate of the sampling 
variance of the means of a systematic sample, or, alternatively, subject to certain assump- 
tions, provided an upper limit to the error. Jowett (1952) gave a method for the determina- 
tion of the variance of a systematic sample, on the assumption that the observations were 
derived from a stationary process. This was derived from earlier work by Cochran (1946) 
and Quenouille (1949) in the same field. 

In this paper we shall develop a method similar to Jowett’s, but involving rather weaker 
assumptions; we shall also derive a form for the variance of samples with spacings less than 
or equal to that of the observed data (which is not covered by Jowett’s method); this 
includes the case where the variance is derived from a single sample. This does not conflict 
with the often-stated principle that a systematic sample cannot by itself provide an 
estimate of error, since the assumptions which are made about the population are equivalent 
to supplementary information. Many alternative assumptions could have been made, and 
the justification of the particular ones chosen must be that they appear likely to apply to 
a wide variety of data and in the cases considered below yield results which agree with 
experiment. 
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These investigations arose out of discussions with officers of the North Canterbury 
Catchment Board (New Zealand) on the spacing of points along a line on which observations 
were to be made to determine the percentage of ground with various types of cover. The 
main part of the theory is applicable to any series of observations spaced out uniformly in 
time or space which fit the model described below and holds for both discrete and continuous 
variables. 

We shall, for convenience, always refer to the transect, meaning the whole line along 
which observations are made, and to the points on it. The details of the experimental] work 
are given in § 4. 

2. FORMULAE FOR VARIANCES 


We regard the transect as composed of kn points; the sample is drawn by taking every kth 
point giving k possible samples. The variable at the ith point is x; (i = 1, 2,..., kn). The 
formula for the sampling variance o°(n,k) of the mean of a systematic sample (defined as 
the mean square of the deviation of the sample mean from the population mean %) drawn 
from a finite population is given by Madow & Madow (1944) in the first of a series of 
three papers (Madow, 1949, 1953), and may be derived in a convenient form from the 
usual partition of sums of squares and the relationship 


kn * kn 
X(%-%)P? = YY (x —2;)?/kn, 
i=1 i=1j>i 
hi h : k ( 1 2- 1k(n—8) ‘ 
which gives 2(n,k) = ent x)? — kn? x (®; — X45) 
(0) 
= (a +2f,)— (2 + ea) ’ (1) 
kn 
where f(0) = ¥ (a; -—a)?/kn, 
i=1 
a I's 1 1 k(n—38) 
fn = re kn 2 (%;— 4) (%.43—4), 7 (2) 
1 "S 1 | kn-y 
Fen ” kn 4, kn = (x;—a) (Xj, —@). 





Here a is any constant. 


We let & tend to infinity and define s as i/kn, x(s) as the corresponding value of x; and 
$(t) as 


1-t 
a(t) = | “(ele) —a) (e(s +0) —a) ds, (3) 
Tn the limit i 17-1 /§ er 
Jn, 8,9(5) = Fe 


and thus o2(n, k) > © +24, —26 = o2(n). (4)* 


* Jowett (1952) gave this formula in a slightly different form, Quenouille (1949) gave the form 
when 7 is large as well as k. 
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An alternative form for o2(n) is obtained from the Euler—Maclaurin theorem from which 
it follows that if d(¢) has continuous derivatives up to the (2¢ + 1)th order 


By, 
Rs “(; :) = [. b(t) dt-+ 5 7, (P(0) )+A(1 Sah =e (p00) — f-2(1)) 


s=0 0 





1 of 
+(—1)*2 aan, Py s1(2) path (=) dx, 


A(t) 














0 1/n 2/n ‘ 3/n ty 1 


Fig. 1. Graph of ¢(t) showing shaded areas proportional to o2(n). (Not to scale.) 


where $(#) denotes the ith derivative, B,, are the Bernoulli numbers and 


_ & sin (2lrx) 
Prey silt) = = 224 (]77)24+1° 


Assuming that the series converges we have 


atin) — B= IMO) _ 41) — (0) , HU) =O) (5) 


pokig tl 6n2 360n4 15120n8 


The assumption that ¢(t) is differentiable to the required order is one that in general will 
not be fulfilled by the observed points in a particular transect. If, however, we follow 
Cochran (1946) and regard the observation of a particular transect as a sample drawn from 
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a multivariate parent population, then it will often be reasonable to assume that the 
expected values of ¢(t), etc. (the expectation being taken over all possible drawings from 
the parent population) do have the required property of differentiability even if the 
individual values are discontinuous. We shall therefore, in future, consider the expected 
value of the variance, etc. Since the formulae are unchanged if we replace o2(n), d(t), etc., 
by their expected values, and there is no danger of confusion, we shall use the same symbols 
for the expected values. Where we wish to emphasize the distinction we shall refer to the 
two values as the sample values and the expected values of o2(n), A(t), etc. It will be seen 
that the distinction, although essential where equation (5) is used, is of less practical 
importance in the other case. 


(0) -2¢ 


Variance 











' 
1 
{ 
' 
! 
wilh 





1/naty Spacing 1/n=1 


Fig. 2. Relative values of variances for monotonic decreasing ¢(t). (Not to scale.) 


By similar methods we find that the sampling variance of the mean of a random sample 
of n members (0?(n)) is given by 1 
a(n) = ~ ((0)— 24). (6) 


The functions involved in (4) and (6) are shown in Fig. 1 for a monotonic decreasing ¢(t); 
is given by the area under the curve ¢(t); 6(0)/2n + ¢,, is the area of the polygon OABCD..., 
so that o2(n) is equal to twice the area shown shaded.* 


* This graphical method is equivalent to that given by Jowett (1952). It should however be noted 
that Jowett’s assumption that the x’s form a stationary time series is not invoked. 
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From (5) it follows that provided the expansion is valid 72(n) initially varies as the square 
of the distance (1/n) between observations; on the other hand, o?(n) varies linearly with 1/n. 
For sufficiently large n, 72(n) will therefore be greater than o?(n), and the systematic sample 
more efficient than the random. 

If A(¢) is zero or changes linearly beyond some point f,, then as the spacing of the sampling 
points (1/n) increases beyond ty, o2(n) will increase linearly with 1/n (see Fig. 2). 

The variance o?(n) is a measure of the variability of the mean of all samples drawn from 
the same transect; if we denote by o2,(n) the mean square of the deviations of the mean of 


1 
a sample about | u(t) dt, where ju(t) is the mean at the point ¢t of the parent population from 
0 


which the points of the transect are drawn, then 
O%p(n) = oF(n) + 24", (7) 


= 1 
where ¢’ is the value of ¢ when a is put equal to | p(t) dt (see Fig. 2). This variance is of 
6 1 
interest if we wish to determine the significance of suspected changes in | h(t) dt, and we 
0 


are able to assume that the covariance matrix of the 2’s remains the same. This would 
require justification; for example, in the transects considered in this paper, the gradual 
replacement of one species of plant by another might lead not only to changes in the pro- 
portions of covered ground but also in the covariance matrix. 


3. ESTIMATION 


3-1. To determine o?(n) we must, ideally, know ¢(¢) at all points in the range (0,1). We 
estimate this from a sample x, (r = 1, 2, ...,m) of m points spaced uniformly 1/m apart with 
the first point randomly located in (0, 1/m). For such a sample we define 


yy(u/m) eh ny ae 2 (v,—@) (Tp44u— 4); (8) 


y(u/m) is an unbiased estimate of ¢(u/m). 

If we can assume a specific form (e.g. exponential) for ¢(t) we can fit this to the observed 
values of y(u/m) preferably for values of w sufficiently low to ensure that they are based on 
a large number of points, and determine o3(n) either graphically or from the series form. 

Where this is not possible the determination of ¢(t) presents no difficulties for low values 
of ¢ and consequently the differential coefficients at t = 0, required for the series form, can 
easily be determined; but as ¢ approaches unity the number of terms involved will be too 
small to provide satisfactory estimates. For the graphical form the same objection holds, 
and in addition it is desirable to avoid the labour of computing ¢(¢) at all points if possible. 

If A(¢) is linear above some value ¢, then in the graphical method no further contribution 
to o2(n) will be made by continuing ¢(¢) beyond f,. In practice it will probably be sufficient 
to continue ¢(¢t) to a point such that it is effectively linear over lengths equal to the greatest 
sampling intervals likely to be of interest. 

Assuming the expansion is valid, the error involved in considering only the contribution 
up to ¢, is given by © By a oe 
>» (2v)! n® [p' ; (1) — , (to)]. 


v=1 
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If the series form is used some equivalent assumption is necessary. If it is assumed that 
A(t) is linear from ¢t = t, onwards, then ¢’(1) = — (t))/(1—t)) and all higher derivatives are 
zero. An alternative to this is to assume that in the region (0, 1 —t)), 7(s) has a mean 4, and 
is independent of x(s) in the region (f), 1), where the mean is 74; then for t>t,>4 


$(t) = (1-8) (4 —a) (#24). 
Visual inspection of the observations should provide some check on the validity of this 
assumption, and ¢(t) can then be determined directly from the means at the beginning and 
end of the transect. Even if this assumption does not hold exactly it does not appear to be 
very sensitive to deviations. For example, if ~, and /, are not constant but each is subject 
to a different linear trend of the form 


fy = + fit, fy = A+ fol, 
then 


2 
[Bi(%+ fg—a) — f(a, — 








1-—t)8 
{ 6 Y Bs Boy 








which gives fP(1) = —(a,—a) (a+ f.—a) 
$1) = PiBe 
The assumption of constant means in the regions (0, 1 —t)) and (9, 1) — 
1-¢, 
$(1) = — (a4 a) (ty + fy—a) — | 5) a (%_+ fy—4a) — a(x a, —a)) +" oY” 8. By 





so that, if ¢, is taken as large as possible consistent with basing the means on a reasonable 
sample, the error will not be large unless the trend is marked, i.e. £, and /, are large. 


3-2. When a pilot survey (of m points) has been undertaken to provide design data we 
may want to calculate o?(n) for values of n greater or less than m; where the variance of the 
mean of a sample is to be calculated from the sample itself m = n. 

When 7 is substantially smaller than m, say half or less, the graphical method is best since 
the series form will converge slowly. When n is either of the same order as or greater than m 
the sampling fluctuations in determining the values of ¢(¢) will be comparable with 
the differences to be measured, and the series form must be used. 

The differential coefficients required at t = 0 may be determined by the Gregory—Newton 


formulae, i.e. by putting 6(0) = m(Ayr(0) — 4A2y(0)... ), 
(0) = m%(A%Y(0) — ZA4Y(0)...), 


where Ay(0) = v-,) ~ (0), 


= 1.) -20(7,) +000) 


Since the sampling variance of A‘y(0) increases rapidly with 7 it will be better to use only 
a few terms in estimating (0), (0), etc., and for the same reason to use only a few terms 
in the actual expansion of o2(n). aa means that the series expansion is only useful when 
n is large enough to ensure rapid convergence, since otherwise serious bias may occur. In 
many practical cases precise estimates of the variance are not needed, and fairly large errors 
may be tolerated. 
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3-3. When the observations are subject to a periodic effect it may cause serious errors if 
the period is close to 1/m or a submultiple of it, since in the graphical method it will not be 
justifiable to draw a smooth curve through the calculated values of y(u/m); the series form 
may not converge sufficiently rapidly to provide reliable results. If the existence of a period 
can be expected from the nature of the data, then this difficulty can be avoided by making 
1/m a sufficiently small fraction of the period. Finney (1950) gave an example of an 
unexpected periodic variation occurring in a forestry survey, for which there appeared to 
be no simple explanation; the only safeguard against effects of this type is to take the 
observations so close that it is at least unlikely that the periodic effect will be undetected. 


3-4. In the particular case where 2(s) has the value 0 or 1, equation (8) has a very simple 


form if we put a = 0. Writing v, for the number of 1’s followed w places later by another 
1 we have 


In this case we can set limits on the gradient at ¢ = 1 since 
0<9(t)< (1-2), 
so that —1< 491) <0. 


If it is assumed that no correlation exists between points at the extreme ends of the transect 
then ¢®(1) = —p, ps, where p, and p, are the proportions of 1’s at the beginning and end of 
the transect. 


If m is large so that we can neglect all but the first differences, we have 


o5()=~T5 73 (9) 


where m, is the number of runs of 1’s and 0’s in the set of m observations. 

This approximation is only valid if m is so large that, because of the high correlation 
between adjacent points, the number of runs is altered relatively little by small changes in m. 
We can, for example, find the number of runs in the series obtained by omitting alternate 
observations; if this differs little from m, we can use this form as a quick estimate of the 
variance. 


4. EXPERIMENTAL RESULTS 


The experimental data consisted of a record of vegetative cover which was classified as 
either living vegetation, dead vegetation or bare ground. The vegetation comprised a portion 
of the native tussock grasslands that occur on the mountainous area eastwards of the main 
divide of the Southern Alps of the South Island, New Zealand. Field work was carried out 
in the Waimakariri river catchment, where the soil is largely derived from steep and rugged 
mountains, composed of hard greywacke rock, which rise to some 7000 ft. above sea-level. 
For a century sheep have grazed the tussock grasslands which have been divided by natural 
barriers and fences into extensive blocks. 

The point method of pasture analysis was used and line transects of 10 chains and more 
in length were permanently pegged. The point analyser consisted of a metal frame 20in. 
long which carried a row of steel pins 2in. apart. At the same calendar date in the summer 
of each year the analyser was placed at a series of adjacent positions along the transect and 
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Table i. The number v, of points followed by a point with similar cover at spacing u a 
u 
ai Geis ) ) 
Transect 4 Transect 7 Transect 11 Tran- re: 
sect 8 | 
U | 
| | a ae aaa ba 
a ee ce ee ae ee eles | th 
os Ps op ae te ee a pel re May | ot 
0 1570 201 2226 4713 | 343 | 1227 2955 127 676 3335 | 
1 1362 58 1992 4315 84 933 | 2787 | 33 514 2750 | 
2 | 1252 38 1895 | 4194 55 848 2718 | 24 | 456 | 2498 
| 3 | 1169 29 1815 4116 43 802 | 2668 | 14 424 | 2340 
| 4 | 1100 25 1745 4061 36 750 2635 | 16 401 | 2214 
5 | 1046 | 27 1681 4015 | 31 717 2601 | 12 381 2151 
6 974 20 1623 3991 33 694 2574 | 11 | 365 2077 
| 7 919 16 1579 3974 27 668 2556 8 350 2042 
8 872 19 1530 3936 21 647 2534 6 338 2025 
| 9 829 13 1488 3924 24 630 2518 8 320 1999 
| | 
| 
| 10 797 ll 1454 3909 18 626 2506 4 | 314 1974 | 
| 11 773 9 1426 3888 18 615 2499 3 306 1962 
| 12 762 ll 1411 3869 18 595 2478 4 | 294 1968 
| 13 | 743 15 1398 3855 17 586 | 2464 7 | 281 1974 
| 14 | 733 | 11 1382 3856 21 581 | 2451 | 6 | 271 1981 | 
| } | 
| 15 724 | 8 1369 | 3847 | 17 581 2444 | 6 | 268 1979 
16 724 | 9 1369 3832 | 18 561 2434 | 8 | 260 1966 
17 | 730 | 4 1373 3812 | 18 550 2427 | 7 | 959 1936 
| 18 | 728 12 1368 3809 17 557 2418 6 | 251 1927 
| 19 719 16 1363 3788 14 544 2414 | 3 | 250 1915 
| | 
| 20 718 ll 1363 3784 13 530 2408 | 4 245 1936 
| 21 709 14 1348 3786 | 21 535 2404 1 | 243 1951 
| 22 704 13 1336 3788 | 24 522 2407 | 5 | 253 1941 
| 23 693 11 1330 3779 17 524 2404 5 | 249 1945 
24 694 | 11 1325 3778 21 522 2389 8 | 236 1942 
| 
25 696 10 1322 3767 25 513 2380 3 | 232 1944 
| 26 700 10 1326 | 3755 22 501 2387 | 5 | 236 1958 
27 703 14 1335 3745 16 492 2395 4 236 1945 
28 sont ‘ae a 3745 19 488 2398 4 240 1931 
29 oe a “ 3746 14 483 2396 | 4 | 244 1918 
| 30 - as — | 3745 19 | 473 | 2398 | 9 | 237 1910 
| 35 _ — —- | — — ~~ -— —|— 1875 
40 _ ~ — er ent Se eet fees ee 1826 
| | cK 
No. of 3997 6283 3758 6321 8 
points (m) | a 
a ae a a ee ee v 
| | | a 
Percentage | 39.08 | 5-03 | 55-69 | 75-01 | 5-46 | 19:53 78-63 | 3-38 | 17:99 | 52-76 
of cover | | 8 
| | | ‘ 
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the state of the cover beneath each pin was recorded; this gave a set of observations 
uniformly spaced every 2 in. along the transect. 

Four transects covering a wide range of conditions were chosen for study; these are 
referred to as transects 4, 7, 8 and 11. A brief description of each is given in the Appendix. 

We consider separately the three cases living or not living, dead or not dead, bare or not 
bare; the first is the most important ecologically. We describe the state of the ground at 
the point s by putting x(s) = 1 if there is living cover, a(s) = 0 if not, and similarly for the 
other cases. 


Table 2. The series for o2(n) 
Transect 4, m = 3997 


10 

VL = — ~ (46 75-0197 ) 
n* 
104 

Vp = A (s0:30- ae 
104 

Veg =— (sv. “03 —0- 381 — =) 
n® 


Transect 7, m = 6283 


108 

V;, = 3S (102 -32—0: 650 4 
n2 
104 2 

Vp = (74-17-0-592 ~ . 
n2 n2 
108 

¥e=— (75-80-0 473 — ) 
n2 


Transect 11, m = 3758 


10! 
Vi = “S (40 -59 —0- 222 — —= +) 
n* 


104 2 
Vp =— (27-53 —0-239 — ... 
n n 


104 m 
Vg = (39 99 — 0217 “4 
n? 


Transect 8, m = 6321 


104 m 
“—e= 3S (138 48 — 0-664 sf 
n 


In Table 1 we give the values of v,, for living, dead and bare cover (L, D, B) for the cases 
considered. 

Examination of Table 1 shows that for the observations of dead points for spacings 
greater than about 50in. (i.e. w> 25) the value of v, (apart from random fluctuations) is 
approximately that which would be expected if there was no correlation (i.e. (m—u) p?, 
where p is the proportion dead). For the living and bare points there is still some correlation 
at this spacing, but it is changing sufficiently slowly to regard it as linear for all except large 
sampling intervals; this is equivalent to ignoring any contribution to the variance made by 
values of (¢) not tabulated. 


10 Biom. 43 
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To obtain the coefficients in the series expansion only differences up to the third were used 
at ¢ = 0 in order to avoid excessive sampling errors. At t = 1, 6(1) was taken as — p*; all 
other derivatives were taken as zero. 

In Table 2 we give the variances of the percentages of living, dead and bare (V;,, Vp, Vg) in 
series form. The total number of points in the transect is m so that n sampling points 
correspond to a spacing of 2m/n inches. 


Table 3. Comparison between the variances of systematic and random samples 
drawn from correlated data 

















j | | 
e | | | } 
Spacing ‘ eet’ : ‘ | . 
(2m/n in.) | 10 in. 20 in. 40 in. | 60 in. | 80 in. 
| | | 
Transect 4 
Vz | 0-604 (0-543*) | 1-746 | 6-088 (5-8527) | 11-156 16-223 (9-223) 
Vz | 2-996 | 5-992 | 11-985 17-977 | 23-970 
Vp 0-370 (0-373*) 0-902 | 2-091 (3-233) 3-330 4-568 (2-5387) 
Vp 0-571 | 1-143 2-286 3-429 4-572 
Vp 0-586 (0-610*) 1-959 6-219 (2-059) | 11-118 16-017 (19-5197) 
Vz 3-090 | 6-180 | 12-360 18-540 24-720 
Transect 7 
1 0-400 (0-420*) | 1-141 2-942 (6-2007) 4-991 | 7-020 (9-1207) 
Vz 1-496 2-992 5-983 8-975 | 11-967 
Vp 0-267 (0-273*) 0-644 1-462 (1-8637) 2-282 | 3-100 (1-9187) 
Vp 0-409 0-819 1-637 2-456 | 3°274 
Va 0-341 (0-310*) | 0-842 2°331 (3-229) 3-480 | 5-300 (7-0657) 
Vz 1-257 2-514 5-029 7543 | 10-058 
Transect 11 
Vi 0-435 (0-510*) | 1-606 | 4-090 (4-494) 7-819 | 11-549 (15-3667) 
Vi 2-235 | 4470 | 8-940 13-410 | 17-881 
Vo 0-243 (0-277*) 0-633 | 1-503 (1-7697) 2-444 3-386 (4-7967) 
Vp | 0-434 0-869 | 1-738 2-606 | 3-475 
Ve 0-437 (0-463*) | 1-389 3°463 (3-4267) 6-400 9-337 (13-5357) 
Vz 1-917 3-833 7-667 11-500 15-334 
Transect 8 
Vz 0-661 (0-610*) | 1-791 | 4-957 (3-1767) | 8-474 11-410 (6-7447) 
Vz 1-969 | 3-939 | 7-878 | 11-817 15-755 
| 








* Approximate estimates determined from 03(n) = m,/12n?. 
+ Estimates obtained from sampling experiment. 


The values of the variance obtained by direct integration are given in Table 3, and for 
comparison the variances V7, V, and V’, which would be obtained with the same sampling 
number if the observations were made at random. At close spacings V; and V, are much less 
than V‘, and V7; the difference is smaller between V, and V4, due to the fact that the dead 
vegetation comprised a comparatively small percentage of the whole and occurred more or 
less randomly interspersed with the living cover, while the living vegetation, particularly 
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in transect 4, tended to occur in clumps interspersed with patches of rock or bare ground, 
leading to high values of $(t) at close spacings of the order of 10-20 in. The smaller differences 
on transect 8 are consistent with the rather more uniform cover. 

The values of the variance calculated from the approximate formula given in (9) for 
a spacing of 10in. (x = m/5) are also shown in Table 3; the agreement is satisfactory. This 
example illustrates a point noted in § 3-2. The approximate formula is very nearly equivalent 
to using only Ay(0) to estimate ¢®(0) and ignoring all later terms in the series for o2(n); for 
a spacing of 10in. this gives a better approximation to the variance as determined by 
numerical integration than does the series form given in Table 2, which involves A?y(0) 
and A*7(0) as well. 

As a check on the validity of the theory and indirectly on the adequacy of the model we 
took 10 samples, starting each at a point randomly chosen from the first 20 points on the 
transect and taking every 20th successive point and a similar set of 20 samples taking every 
40th successive point. The variances of the means of these samples provide an experimental 
estimate of o2(n,k) given by (1), where k = 20 or k = 40, and kn =m. But o?(n,k) can 
also be regarded as an estimate of o2(n) — o2(kn), so that by adding the small term o2(kn), 
given by putting n = m in (9) to the experimental estimate of o2(n,k), we obtain an estimate 
of o?(n). These are given in brackets next to the corresponding variances determined by the 
graphical method (spacings 40 and 80 in.) in Table 3. These comparisons are not independent, 
but the mean of the ratio of the experimental variances to the variances obtained by the 
graphical method is 1-0946 which is satisfactory. 


5. SUMMARY 


The formula given by Madow & Madow (1944) for the variance of the mean of a systematic 
sample drawn from a finite population was applied to a continuous population, and a 
method was developed of determining, subject to certain assumptions, the variance of the 
mean of a systematic sample from single samples. This was applied to the problem of 
estimating the amount of vegetative cover in ecological studies. 


I wish to express my thanks to Dr P. Whittle and other members of the Applied Mathe- 
matics Laboratory for valuable suggestions; to Mr R. D. Dick, Soil Conservation Research 
Officer of the North Canterbury Catchment Board, who raised this problem with the 
laboratory, made available his extensive records, gave much time in discussions and supplied 
the descriptive notes; also to the referees for bringing certain recent work to my notice. 
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APPENDIX: DESCRIPTIVE NOTES 


Transect 4, Leith Hill 


The top peg is at 4250 ft. and the line transect extends down the mountain slope for 10 chains at an 
average angle of 29°. In the upper portion, particularly, islands of vegetation (tall tussocks, Danthonia 
flavescens) are surrounded by bare ground with the top soil eroded, and there remains the yellowish 
subsoil down to little subsoil amongst rock fragments. On the lower slopes the tussocks Festuca novae 
zealandae and Poa colensoi form the tussock content of the grassland. Over two dozen species of native 
and introduced grasses and herbaceous plants grow between the tussocks as well as a few shrubby 
plants, such as Pimelia prostrata and a prostrate Dracophyllum species. 


Transect 7, Bridge Hill 


The top peg is at 3450 ft., and the line transect descends at an average angle of 33° for a distance of 
10 chains. The perennial herb Celmisia spectabilis with the tussock Festuca novae zealandae and the 
prostrate much-branched shrub Gaultheria depressa, dominate the physiognomic features in this 
portion of the tussock grasslands. 


Transect 8, Constitution Hill 


The top peg is at 4000 ft., and the line transect descends at an average angle of 33° for a distance 
of 15 chains. Festuca novae zealandae and Poa colensoi are the two common tussock species, and about 
two dozen native and introduced species of grasses and herbs grow in the inter-tussock spaces. Shrubby 
species, such as Pimelia prostrata, Cassinia fulvida, Discaria toumatou and Leptospermum scoparium 
occur occasionally. 

Transect 11, Lower Lyndon 


The top peg is at 3550 ft. and the line transect descends at an average angle of 29°. This portion of 
the tussock grasslands is characterized by shrubby growth comprising Cyathodes colensoi, Gaultheria 
species, Discaria toumatou interspersed with Celmisia spectabilis, Festuca novae zealandae and repre- 
sentatives of about the two dozen species of native and introduced grasses and herbs common in these 
grasslands. 
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GROUPING METHODS IN THE FITTING OF POLYNOMIALS TO 
UNEQUALLY SPACED OBSERVATIONS 


By P. G. GUEST 
University of Sydney, Australia 


INTRODUCTION 


When a polynomial is to be fitted to a series of observations y; at values x; of the inde- 
pendent variable and the number of observations n is large, the observations at neighbouring 
values of x; are often grouped together to give a set of N = n/r pairs of values yy;, %y;- 
These quantities y,;, %);, are the sums of the r observed values in the particular group. 
Since the labour required to fit a polynomial curve increases very greatly as the number of 
observations becomes larger, the time spent in fitting a curve to the N grouped observations 
is only a small fraction of that which would be required if the original set of n observations 
was used. It is for this reason that grouping is often employed before fitting a curve to 
the observed values. 

It is clear, however, that in general the grouped curve will not be the same as the curve 
obtained from the original set of observations. The random sampling errors will affect the 
curves differently in the two cases, but that is a minor matter. What is more important is 
that the grouping will give rise to a loss of information—an increase in standard error and 
a decrease in efficiency—and may also give rise to bias in the estimates. It is the purpose 
of the present paper to investigate the last two effects and to develop criteria by means of 
which the suitability of the grouping method as an approximation to the full least-squares 
solution may be determined in any particular case. 

In an earlier paper (Guest, 1954) an account has been given of the use of grouping methods 
in the fitting of polynomials to equally spaced observations. Unfortunately, no exact 
general treatment can be given in cases where the observations are not spaced at equal 
intervals. The procedure adopted here is to describe the departure from uniform spacing 
by two parameters x, and k3. kK, is a measure of the crowding of the observations towards 
one end of the range of the independent variable x, while x, is a measure of the crowding 
towards the centre of the range rather than towards the ends. A full account of the use of 
these parameters has been given in a previous paper (Guest, 1953a). 

A short section has been included on the use of step functions in the fitting of polynomials, 
with a full discussion of the cases where the polynomial is only of the first degree. Finally, 
the various methods of obtaining a fitted curve are compared from the point of view of time 
required, of efficiency, and of bias of the estimates. 


REPRESENTATION OF THE OBSERVATIONS 


When the values of the independent variable 2:, at the n points of observation are arranged 
in order of magnitude, each observation may be identified by the number ¢ giving its 
position in the sequence, ¢ taking integral or half-integral values from + }(n—1)to — $(n—1). 
The observations at the points 2, (€) will be represented by the symbols y,,(¢). In the present 
discussion the system of points x,,(¢) will be replaced by a smoothed-out system X,,(é) 
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obtained by fitting a curve of the third degree in € to the values ~,,(e). With a suitable choice 
of origin 


X,(€) - kun Tin(€) + Kon Ton(€) +kgp, T(E), (1) 
where kin = x, (€) T;,(€)/> Tn (€); (2) 


and 7;,,(¢) is the orthogonal polynomial of degree j in e. 
If the n observations are converted by grouping into N = n/r values 


(r—1) 


Yn(€) ” D> Y, (rE +2) 
z=—4(r—1) 
at points Xy(€) = } 2, (re+2), 
2 


the smoothed-out system for the grouped observations may be written as 
X y(€) = kyy Ty(€) + key Toy(€) + bgy T'sn(€), (3) 
where from the equations connecting the polynomials 7;,,(€), T;,(¢) (Guest, 1954) 
ky = [kin — Zot? - 1) sp], 
key = kn, (4) 
ksy rd kg. 
It is convenient to remove a scale factor 
Pn (kin e Tor ks,) (5) 


from equation (1), and to change to polynomials 


Tjp(€) = nT, (€). (6) 

Then Xn (€) = $n*En(€), (7) 
where En(€) = KinTin(€) + Ken Ton(€) + 2K 3nTan(€)s (8) 
Kin =Pnkiny Kon =Nbpkons Kon = 3VOn kn; (9) 

and Kin = 1—}Ks,- (10) 


The advantage of this notation is that the coefficients x,, are all of the same order of 
magnitude, as also are the orthogonal polynomials 7,,,(¢) (Guest, 1953 2). 


For the grouped observations X y(e) is replaced by 


Ev(€) = Ky T(E) + Kay Ton (€) + 2KgyTan(€), (11) 
where X y(e) = pz *En(e), (12) 
and Ky =r *O,kyy, Key = Nrd, key, Kay = N*1*, kgy- (13) 


From equations (4), connecting k,,,, k;y, and equations (9) and (13), it is found that 


K3n veg Kgn> Ken = Kon: } (14) 


Kin = Kin §(N-?—n-*) K3n- 





a ee ae ee en ee ae. a | 





(14) 
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It will be assumed that n is so large that n-* can be neglected. Then £y(¢) can be written in 
the form 


3 
En(€) = 2 (Cu- N*dy) Ty (€); (15) 
where Cy, = 1—}k3, dy, = ks, 
Cio = Ko, dy, = 0, (16) 
C13 = 2Ks, d,3 = 0, 


and x; is written for x;,. 


EFFICIENCIES OF THE ESTIMATES 


The fitted curve of degree p in x is usually expressed in the form of a power series, 


ug(e) = ¥ bpp (17) 


J 


Less commonly, the curve may be expressed as a series of orthogonal polynomials, 


up(x) = 3 a,2(2), (18) 
=0 


where 7;(x) is the Tchebycheff polynomial of degree j in the variable x with the leading 
coefficient unity.* However, for theoretical discussions, the second form is much the more 
convenient because of the simplification brought about by the orthogonal property. The 
b,; are linear functions of the a;—in particular, the coefficient of highest degree 6,,, is 
identical with a,. It is the coefficients a; which are obtained in the forward section of the 
standard Doolittle method for the solution of the normal equations. The concentration on 
the orthogonal coefficients must not be taken to imply that the actual fitting is to be done 
by calculating the orthogonal polynomials at the points of observation—in fact, the usual 
scheme based on power moments should be employed. The relation between the power series 
and the orthogonal representation has been fully discussed by Guest (1950) and by Hayes 
& Vickers (1951). 
The variance of a; is given by the formula 


vara; = vary/> T5(z;). (19) 


The values x; are, in the representation used here, replaced by the smoothed set X,,(¢). 
Using equations (7), (8) and (10), and the standard expressions for 7,,,(€) = n~*"Z,(e), it 
can be shown that })7?(x;) is given by an approximate equation of the form 

i 


ET He) = on, —n-g), “_ 
where higher powers of n-? have been neglected. Similarly, for the grouped set, 
ET ives) = Pa)" NGL N-%g,). oT 


* It should be noted that 7,(x) and 7',(¢) are different sets of polynomials. The use of the symbols 
T, to denote any set of orthogonal polynomials is convenient, but it is realized that it may be con- 
fusing in certain cases. The two sets of orthogonal polynomials are here distinguished by the use of 
different symbols x and ¢ for thu variables. 
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In these formulae f; and g; are functions of x, and k;, but not of x or N. It will be assumed in 
the subsequent discussion that the number of observations is so large that the term in n~* 
may be omitted. 

Since yy; is the sum of r observations y;, 


var Y¥y; = 7 Var Y;. (22) 


If the curve fitted to the N grouped observations is written as 


Upn(&) = Lasy Tjy (2), (23) 
j 
then var jy = var yy;] Ti y(«Ni) 
i 
and (vara,)/(var a;y) =r°9-%{1 — N-*g,}, 


The efficiency 4(a;,) of the grouped estimate will then be given by 
9(4;y) = 1—N~q;. (24) 

In practice the g; are too complicated to be represented by an explicit formula, but it is 
fairly straightforward to calculate them for given values of x, and x;. The expressions for 
9(4;y) are listed in Table 1 for polynomials of the first, second and third degrees, and for 
kz = 0(0°5) 1-0, K, = — 1-0(0-5) 1-0. 

From these expressions the efficiencies for any value of N can be determined. The larger 
the value of N the higher the efficiencies, but also the longer the time required for the 
calculations. Some compromise must be made, and it appears that suitable values of N lie 
in the range 9-12 for a second degree curve, and in the range 16—21 for a third-degree curve. 


For a first-degree curve five groups are sufficient. The efficiencies for these suggested values 
of N are shown in Table 2. 


BIAS OF THE GROUPED ESTIMATES 


If the polynomial is of the second or third degree, grouping will usually lead to bias in the 
estimates. The origin of the bias is seen most clearly by considering the power-series 
representation. Denoting ‘true’ values by a prime superscript, the expectation of the 
observation corresponding to a given value 2; is 


E(y,;) = Eby %. 
Hence E(yyi) “ x L(y;) = 2p; BX, 
r r 


the suffix r attached to the summation sign indicating that the sum is taken over the 
articular group. Now - : ; 
. wis Laer Hy x,y, 
r Tr 


unless j is 0 or 1, and so E(yy;) +d (r-#6},) wh,,. (25) 
¥] 


The true value of the coefficient for the grouped curve is 
biin = rI+1b |. 


Thus (25) becomes E(yyi) +d bp jwthi- (26) 
j 
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Because of this inequality the estimates b,,;, obtained from the ordinary normal equations 


(Yi — byin®yi) th; = 0 (27) 
J 


Vv 
will in fact be biased estimates. 
Again, itis more convenient (although the arithmetic is still very complicated) to calculate 
the bias in the orthogonal coefficients rather than in the power-series coefficients. The bias 
in a, y is defined as the difference of the expectation from the true value, 


E (ayy) — Ay. 


The bias turns out to be, to order N-?, of the form 
Pp 
y= Gil Pn Nr?) ajy, (28) 
j=0 


where the g;, are functions of x, and x3. The true values aj, are of course unknown, but an 
estimate of the bias can be obtained by substituting the calculated values a,, for ay in (28). 

From (11) and the standard expressions for 7;,(¢), the range of £y(e) is (N — 1) (Kyy + }Ksy)- 
So, from (12), ¢;'Nr? is approximately the range of Xy(e) and hence also of xy,;—the 
difference between the greatest and least values. Thus the estimate of bias can be written as 


p . 
is X Gie%yn( Ha tyi)i*. (29) 
j= 


The ratio of bias to standard error will be more useful than the actual bias in deciding 
whether the grouping will be satisfactory. The standard error of a,,y is, from (21), 


(d-Nr?)-* N-4f, 48.2. yy), 


and so the ratio can be put in the form 


Biasa.y . 1 a 
<< s. * Saad DB sejy(Ra Lyi)’, (30) 
where oy is the standard error of a grouped observation and the B,, are functions of x, 
and k;. Bo, and B,, vanish, and so for the third-degree polynomial the bias depends only on 
a, and a3, while for the second-degree polynomial the bias depends only on a,. As was the 
case with the g;, the explicit formulae for the B,, are excessively complicated, but it is 
possible to obtain numerical values for selected xk, and x;. Some of these values are listed 
in Table 3. 

It will often be necessary to ascertain before proceeding with the calculations whether 
the grouping will give rise to significant bias. In terms of the observations before grouping 
equation (30) becomes Bi k 

wisn . ™ N-25B,, A, 
S.E.dpy | d DBAs =) 
where a is the standard error of an observation and A,, is written for a;,(Rax;,)/. The values 
Aj, can be estimated in the following way. If the five values of y; spaced at intervals of 
one-quarter of the range of x; are denoted by y( +4), y(+4), y(0), y(—4), y(— 4), then 


Aon = 2[y( +3) +y(— 3) — 2y()], 


(32) 
Ag =*elty( +3) —Y(—3)}— 2y(+ 4) -—y(— D)}- 
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These are actually the values for a curve passing through the five points. The estimate of 
A,,, Will usually be reasonably accurate, the estimate of A,,, much less so. If the scatter of 
the observations is large an average value of y in the particular region should be taken. 
o can be estimated roughly from the scatter of the observations. «, and x, can be estimated 
from the values of x corresponding to values of € of + }(n—1), + }(n—1), 0, by the formulae 

= gt+4)+2(—3)— ad 


Ma" +4) — 2-9 
_ SAt—aA-D-2+)-aA—3)} 


> 


(33) 


~~ s x(+4)—2(—}4) 


eis the number specifying the position in the sequence. 


ILLUSTRATIVE EXAMPLE 


In Table 4 is listed a set of 67 observations of the mechanical equivalent of heat as a function 
of temperature (Jaeger & von Steinwehr, 1921). Suitable constants have been subtracted 
from the original observations to reduce their magnitudes. A rough check on the order of 
the biases for V = 17 will first be made. 

The range of x; is from +30 to —15. Hence values of y in the neighbourhood of x = +30, 
+19, +7, —4, —15 are required for substitution in formulae (32). Average values in these 
regions are of the order of 


y(+3) = 110, y(+4) = 80, y(0) = 50, y(—}) = 120, y(—}) = 290, 
and so A,, = +600, As, = — 550. 


These values are of course quite rough, but will suffice for the estimation of the biases. 
The standard error o, estimated from the scatter of the observations, is about 30. Since 
there are 67 observations, the values of x; for the observations numbered 1, 174, 34, 505, 67 
are required for the estimation of x, and «x, from formulae (33). These values are: 
a(+}) = 29:6, 2(+})=11-0, 2(0)=11, 2(—})=-65, a(—}) =-152, 
and so Ky=0°54, K3+0°3, K,=0-6. 
Hence, from Table 3, 
B.,~ 0-07, Bo.~ 0-04, Byg~ 0-00, 
By~ 0: 12, Byo~ 0-06, By,~ 0-03. 
From equation (31) 
Biasa,y 4/67 1 
—_—_—* = +_ -_ (0-07 x 600 — 0-12 x 550) = — 0-02. 
ay sw sa 
The ratios for the other coefficients will be of the same order, and so the bias with N = 17 
will be negligible. It will be observed that this extremely small bias in the estimates occurs 
because A,,, and A,,, are of opposite sign, but even if they were of the same sign the ratio 
would still only be 0-10. 
The grouped values are shown in Table 4(b). The central group contains only three 


observations, and so the sums for this group are increased by 4/3. The calculations using the 
Doolittle scheme have been carried out both for the original set of observations and for the 








156 Fitting of polynomials to unequally spaced observations 


grouped set. Table 4(c) gives the calculated values of =7'(x) and the efficiencies of the 
grouped estimates. These are compared with the efficiencies as predicted by Table 1 for 
k3 = 0:3, Kk, = 0-6. The values given by Table 1 are quite close to the true values—in fact, 
closer than might have been expected considering the inaccuracy of the estimate of ks. 
Table 4 (d) gives the orthogonal coefficients, and the power-series coefficients for the third- 
degree curves. The biases are seen to be completely negligible, as was predicted. There will, 
of course, be random differences between the two sets of coefficients, and these completely 
swamp any bias effects in this case. 


Table 4. Illustrative example 


(a) Original observations (n = 67) 








} x y v y 2 y x y 
+2960 | 108 + 11-49 88 — 0-05 119 — 713 | 101 
+2836 | 102 + 10-60 57 — 0-25 106 — 7:40 165 
+2696 | 103 + 919 58 —1-17 104 — 741 | 169 
+2579 | 74 +915 | 79 —1-18 117 = 18, | a 

| +2556 | 25 + 7-76 66 — 1-45 104 — 839 | 153 

| +2434 | 103 + 6-33 41 — 1-58 97 — 855 | 175 

| +2309 | 104 + 624 71 — 1-58 104 — 8-67 202 

|  +19-41 66 + 5-55 50 — 2-59 133 | -— 998 | 176 

| +17-19 35 + 5:36 70 — 2-67 90 — 11-05 207 

| +16-64 173 + 4-82 60 —3-61 137 -1119 | 200 

| +15-79 85 + 411 66 — 3-69 130 —11-49 | 230 

| +1575 | 83 + 3-96 76 —5-01 118 1300 |” 256 

| | 

| | 

| +1439 | 69 + 3-24 71 — 5-12 114 — 13-62 | 259 

| +1432 | 113 + 2-53 80 — 6-00 201 — 13-85 245 

| +1398 | 45 + 1:80 83 — 6-03 114 — 14-35 315 

| +1254 | 68 + 1-41 80 — 6-97 137 — 15-25 291 

| 

| | + 115 89 | 

| + 1-13 96 

| + 0-01 99 | 
| | | 
cael | he KT | 








(b) Grouped observations (N = 17) 














| | | 
a Siu x | y | x | oy x | y 
| 
uaaeen —- —|—— 
| +110-71 | 387 + 40-43 | 282 | — 2-65 446 —29-88 | 622 
| 492-40 298 +2588 | 228 | — 7-20 438 —35°59 | 706 
| +65-37 | 376 +1825 | 272 | —14-98 475 —46-32 | 891 
| 455-23 | 295 +898 | 314 | — 24-12 566 — 57-07 1110 
| | + 3-05 | 379 | 
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Table 4 (cont.) 
(c) Efficiencies n(a;y) 











| | | 
=T73, | =r, n= =T3y/rI7TT?, | n (Table 1) 
a 9265 | 36861 0-9946 1— 1-46/289 = 0-9950 
2 | 143-6610! | 8922x104 0-970 1— 9-9/289 = 0-966 
3 1-913 x 108 1780 x 108 0-909 1 —31-1/289 = 0-892 








(d) Orthogonal coefficients a and power-series coefficients b 











| | , | | 
j Ajn | a;N ra;n Dsin bain | rian | 
3 WR etn eal ied ne 
0 — * jg 92-44 4-4 372:18 | 93-0 
1 — 3-59 + 0-25 | —3-59 — 3-59 —5:5740-48 | -560 | -—5-60 | 
2 + 0-261 + 0-020 +0-0661 + 0-264 +0-407+0-040 | +0-1002 | +0-401 | 
3 | —0-0074+40-0018 —0-000451 —0-0072 bsg = G3 | 
| | 








The standard error of an observation may be estimated from the usual formulae 
{Zvi /(n—p—1)}* and {Zviy/(N-p—1)}* 


for the ungrouped and grouped cases respectively. In both cases the sum of the squares of 
the residuals can be found from the formula 


Xv? = Ly? — La XT}. 


In the present example the estimate of o in the ungrouped case is 24, and the estimate of 
oy in the grouped case is 37. These should be in the ratio 1:2. It is not clear whether the 
departure from this ratio is significant. 


WEIGHTING OF GROUPED OBSERVATIONS 
Often it is not possible to find a suitable pair of values V, 7, such that n = Nr. More usually, 
n = Nr+yp, (34) 


where v is small. The v additional observations should be included in v groups near the centre 
of the range of x;, and in forming these groups the sums of the 2; and y; values must be 
multiplied by 7/(r +1) to bring them to the same scale as the other groups. If v is negative 
the factor is r/(r—1). 

These v groups should strictly be given a weight (r + 1)/r in forming the moments and the 
sums of the powers. This, however, greatly increases the time required to evaluate these 
sums, and the calculations can be done much more rapidly if the weights are omitted. The 
omission of the weights does not lead to any increase in the bias, but it does produce a drop 
in the efficiency. A detailed investigation, too long to be reproduced here, indicates a drop 
in efficiency by a factor of the order of 1 — vr-2?N-, which will almost always be negligible. 
Hence it is recommended that the grouped observations be left unweighted. 
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If the original set of observations y; have different weights w,, then Zw; replaces n in 
formula (34), and the observation of weight w; is just regarded as equivalent to w; observa- 
tions of unit weight. The observations are divided into N groups each having the same Xw,. 


STEP-FUNCTION METHODS 


For the first-degree polynomial the use of step functions provides the most rapid method 
of determining the line which fits the observations. Since in practice the majority of the 
curves to be fitted will be of the first degree, a full discussion of the use of step functions in 
these cases will now be presented. 

If the scale factor is removed, the independent variable may be represented as 


E(e) = ke + Kn (e? — Jn?) + 2xgn-*(e3 — 3,n*e), (35) 
terms of the order of n~* being neglected. Since x, is 1 — 4x3, this can be regrouped as 
E(e) = e(1— 4x5) + 2kgn—*e3 + Kyn—(e? — yn’). (36) 
If w,(e) is any function of ¢ for which Xw,(e) = 0, then 
by = Ew,(e) y(e)/Zw,(€) E(e) 


will provide an estimate of the slope of the straight line which fits the observations. The 
standard error of this estimate is given by 


o°(y)/o*(by) = {Zw (e) E(€)}?/[Zw4(e)]. (37) 
For a step function, Xw,(e)f(e) is of the form 
Mn-1) (a,~1) Ma-1)  Ka~1) Ham—1-1) a@m—1) 
ja = - 3 )+a( 8 - 2 )++am( 2 — 3 )|VO-M-o), (8) 
0,4 0,4 0,4 0,4 0,4 0,4 


the numbers 4(a;— 1) being the values of ¢ at the ends of the steps. Neglecting terms of the 
order of n-?, 


Ma-1) ; 
S fel—(—e} = (Ij +124, j odd, 
0,4 (39) 
=0, jeven, 
where na = a. Hence, from (38), 
m es 
Zany e) eh = {P44 [9+ 1) 2} S dws hte — aga) (40) 
when j is odd, and in 
Lwi(e) -” nz Tin—K+1(%m—k = O%m—He+1)» (41) 


For the equally spaced case it has been shown (Guest, 1954) that the estimate b, is of 
smallest standard error when the steps are of equal size and the weights are m,m—1.,..., 1. 
Using then the values 

Ym-k+1 = k, on—-k = (2k+ 1)/(2m+ 1), 
in (40), 
Xw,(€) € = {2n?/(2m + 1)?} Uk, (42) 
Xw,(e) e? = {n4/2(2m + 1)4} {Uk? + 4k}, (43) 
while from (41) 
Lwr(e) = {2n/(2m + 1)} Uk. (44) 








—— 





. The 


(37) 
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Substituting these values in (37) and inserting the standard expressions for Xk’, it is found 
a o*(y)/o?(by) = [z_n*] [1 — (2m + 1)-*] [1 — Zug{l + (2m + 1)-P, 
while for the least-squares estimate b; =aj, (19) and (20) give 
o*(y)/o7(b}) = n*fj. 
The explicit expression for /; is 
12f, = 1—3k3 + g5k5 + 5k3- 
Therefore the efficiency 7(b,) of the estimate 5, is given by 
(by) = [1—N~*] [1 — 5«,(1 + N~)P/12f,, (45) 


where N = 2m+1 is the number of groups. 

The function 7(5,) is tabulated in Table 5 for values of N equal to 3, 5, 7, 00. The efficiencies 
when N = 3 are perhaps a little low, but the value NV = 5, corresponding to double-step 
functions, gives satisfactory efficiencies. 


Table 5. Percentage efficiencies in first-degree step-function methods 
Ke | 0 | 0-5 | 1-0 
Ks | -10 -05 0 05 10, -10 -05 0O 05 1:0|)|-10 —0-5 0 05 1:0 
N=3) 91-1 90-4 88-9 86-3 81-8 | 89-1 88-0 86-0 82:9 77-9| 87-1 85-7 83-3 79-7 74-3 
N=5 | 96-1 96-4 96-0 94-6 91-6) 94:0 93-8 92-9 90-9 87-2) 91-9 91-3 90-0 87-5 83-2 
N=7 | 97:5 98-0 98-0 97:0 94-4 | 95:3 95-4 94-8 93-2 89-9 | 93-2 92-9 91-8 89-7 85-7 
N=00| 98:8 99-6100 99-5 97-4| 96-6 97-0 96-8 95-6 92-7 94:5 94-5 93-8 91-9 88-4 


An account has been given in two earlier papers (Guest, 1952, 19535) of the use of single- 
step functions in the fitting of higher degree polynomials. Step-function methods are rapid 
and give unbiased estimates, but the efficiencies are rather low if the departure from uniform 
spacing is at all pronounced. An investigation into the use of double-step functions has 
since been carried out, but it appears that the improvement in efficiency is only slight for 
markedly non-uniform spacing. One weakness of the earlier method is that different 
functions were used for the second- and third-degree curves. It has since been found that 
if the steps in the third-degree equation are weighted in the ratio 2:1, the same functions 
can be used for the second- and third-degree curves without any great drop in efficiency. 
The recommended step functions are now given by the expressions below: 


Zero degree: xn 
First degree: =n, + Un}, — Un; 
Second degree: “nq, + UNg9— Unjgg— Ung, + UN; 
Third degree: 22%, + Ungo— Ung — UNgg + VNgg + 2UNg, — 2Un. 
The observations are supposed numbered from 1 to n, and =n,, signifies the sum for all the 
observations from 1 to n,,. The values n,, for the various step functions are the integers 
nearest to the numbers shown below: 
N41: 033N; No: O-11N; Nog: 0-25n; 
N31: 0-07N; Ngo: 0-15N; Ngg: 0°44; 


, 
Nik = N— N3K- 
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CoNCLUSION 


To obtain an idea of the times required for the various methods, the calculations were 
performed by each method for the example given in Table 4. Table 6 gives a summary of the 
times required to fit polynomials of different degrees (including the checking of the calcula- 
tions) by the various methods, together with the efficiencies and the estimates of the 
coefficients a,,. It will be seen that for the first-degree polynomial the step-function method 
is by far the most rapid one. For the second-degree polynomial the step function and the 
least-squares grouped methods require about the same time, while for the third-degree 
polynomial the least-squares grouped method is the most rapid. 


Table 6. Results for selected example (67 observations) 





























] | nie 
| Ist Degree 2nd Degree 3rd Degree 
ee | | 
| Ti Effi | Ti Effi Ti | Effic. | 
| Time ic. | Time | c. 4 ime | Cc. 
(min.) | 9(a,) | (min.) | (as) Oy % BF) (min.) | (a3) (10°) 
= Fee a = | | | | 
| | 
Least-squares 58 1000 158 | 1-000 | 26-1+2-3 246 | 1-000 | —7-44+1:8 | 
Least-squares grouped 29 =| «0-936 61 | 0-925 | 25-8 100 0-909 | —7-2 
Single-step functions 16 | 0-825 60 | 0-831 | 28-4 128 | 0-725 | —7:9 | 
Double-step functions | 17 | 0-898 67 =| «0-835 28-9 145 0-763 | -—71 
| 








The general conclusion to be drawn from the work described here is that each method has 
its disadvantages. The full least-squares method is efficient and without bias, but it takes 
a very long time, especially if the computor is not familiar with the technique. The least- 
squares grouped method is rapid and the efficiency is high, but there may be bias in the 
estimates. The single-step function method is rapid and of fair efficiency, and there is no 
bias, but the standard errors cannot be estimated in any simple way. 

Perhaps the following statements will provide a satisfactory guide in the selection of the 
appropriate method for the fitting of a second- or third-degree polynomial. If the scatter 
is large the least-squares grouped method should be used, since the bias is not then likely to 
be of importance. Cases will also arise in which it is possible to remove the greater part of 
the variation in the dependent variable by the use of approximate values of the coefficients. 
In particular, if the approximate contributions 6,7? + 6,23 are subtracted before grouping, 
the corresponding least-squares coefficients obtained from the grouped values will be so small 
that the bias due to grouping will be negligible. The efficiency is unaffected by this process. 
If this procedure is not convenient and the scatter is small the step-function method may 
be used, but if accurate estimates of standard error are also required there is no alternative 
but to fit a least-squares curve to the full set of observations. 
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ON THE JOINT DISTRIBUTION OF THE CIRCULAR SERIAL 
CORRELATION COEFFICIENTS 


By G. 8S. WATSON} 
The Australian National University, Canberra, A.C.T. 


1. INTRODUCTION 


Quenouille (1949) applied Koopmans’s method (1942) to the problem of finding the exact 
joint distribution of the serial correlation coefficients in the null case. Madow’s (1945) device 
was used to derive the non-null joint density function. For odd sample sizes, he gave an 
explicit formula for the exact density function. Since the direct smoothing of this function 
is very difficult, Quenouille conjectured a smooth approximation to the exact density and 
supported his conjecture by various arguments about the expected form of the result. 
Jenkins (1954), following Dixon’s (1944) method, has found a smoothed density for the first 
two serial correlation coefficients which, unlike Quenouille’s density, has the correct 
moments up to order n. These smoothed densities were derived for the case of no mean 
corrections. { 

In §2 of the present paper, von Neumann’s (1941) method is generalized to give integral 
equations for the joint density of the serial correlations. From these the relationship of the 
null and non-null densities is easily derived. In §3 the exact density of the circular serial 
correlations is obtained for arbitrary sample sizes, and a fuller discussion is given of the 
interesting summation rule involved. In §4 an attempt is made to derive an approximate 
null density when mean corrections are made by a new smoothing method. The form found 
differs only slightly from Quenouille’s and is subject to the same weaknesses. 


2. SOME GENERAL RESULTS 


The basic problem is to find the joint probability density function of 


x’A;x .. ' 
vr; = ~ (j => 1, 0009 J), (1) 





where the elements of the column vector x are independent normal variates with zero mean 
and unit variance. It does not seem possible to make any progress unless the A; form a 
commutative set of matrices. In the present applications this means that serial correlations 
with a modified definition must be considered. With this assumption the problem may be 
reduced to that of the joint distribution of 


ee Ay Wo + APw, +. id AD wy 


j = 1,...,@), (2 
4 Wot Wy t+... + Wy, f) 9) (2) 





+ Most of the work reported in this paper was done while the author was a Research Officer in the 
Department of Applied Economics, University of Cambridge, and formed part of a thesis issued by the 
Institute of Statistics, University of North Carolina, Mimeograph Series no. 49. 

t Recently H. E. Daniels (1956) has derived the approximate joint density, with and without 
mean corrections, for moderate sample sizes, n, and qg not comparable with n. [See also the paper 
by G. M. Jenkins (1956). These two papers are printed on pp. 169 and 186 below. Ep.] 


Ir Biom. 43 
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where Af), AP, ..., AY are the distinct latent roots of A; with multiplicities 9, p,, ..., P, and 
Wo; Wy, ...,W,, are independent gamma variables of orders po, p,, .--, Pm: The multiplicities of 
the roots of A, associated with any latent vector must be independent of j to obtain the 
reduction (2). 
Writing r; = u,/v(j = 1,...,q), Pitman’s theorem (1937) gives immediately 
E (uk: wk uke...) 


E(rp rier}? ...) = Eokththst )? (3) 





since v is independent of the set of values r,, 72, ...,7,- Writing f(r,, ...,7,) for the joint density 
m 
function of 1,,...,7, and g(v) = vP-'e-*/I'(P) with P = > p,, the joint moment generating 
i=0 
function of w,, U2, ..., U, is easily seen to be given by 


Efexp (tu, +tgUg+...+t,u,)} = Efexp [o(r,t, + rotet-.. + 1gtq)]} 


= [. , far, sl dra” dv e?™7itig(v) f(ry, ..-.%q) 
0 
D 


“| Sees dr, ...dr, (4) 


Dy 





where D, is the domain of joint variation of r,, 7, ...,7,, defined below. But the left-hand 
side of (4) may be evaluated directly to give the integral equation for f(rj, ..., 7) 


he (1) (a4 \— | feet f(r, +++5%q) 
Re ee ce aren (5) 


This is a generalization of von Neumann’s (1941) nate equation for the density function 
of a single ratio. Using the results of von Neumann & Morgenstern (1947), the domain D, is 
seen to be the least convex polyhedron enclosing the points (A, ..., A) (i = 0,1, ...,m) in 
the g-space of 1,79, ...,q- 
The integral equation (5) provides a simple method of deriving the joint density of 
11,19, +++) %q When the x vector has a multivariate normal density function proportional to 


exp {— 4x’(I—6,A,—...—6,A,) x}, (6) 
where 4,, ...,d, are such that the density is non-singular. Since 


yl ds xt,;x’A;x 
Ps A,x = = 4 


(x’x —E6,x’A,x) = se x'(I-$8,A,)x, (7) 
x’x — Xd;x’A;x ea 


where the first factor of (7) is a homogeneous function of degree zero in gamma variables and 
where the second factor of (7) is the sum of these variates—a gamma-P variate. The 
argument leading to (5) now gives 


m AP t, oy +: APt, —Di fs (1, me 14) dr, .. .dr, 
ii 1-6, AD — —3, a) “| | ; + t; y ‘ (8) 
( a rr;d; 





Dy 
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an integral equation for the non-null joint density of r,,...,7,. By substitution in (8), it is 
easily seen that 


— 


Self a» ---.%) = (1 a ~ Oj, AP)Pi(1 — Lb ;7;)-P f(y, «+5 Tq) (9) 


i 
where f(r,, ...,7,) is the solution of (5), the null density. A similar, but incorrect, extension 
of Madow’s (1945) result to the joint distribution has been noted by Quenouille (1949). 
(9) shows that the distributions of the multiple serial correlation ry, ,q2,) and the 
partial serial correlation 7,,,,1.2..,.¢ are independent of 6,,...,d,. This fact for ¢ = 1 has 
been noted by Dixon (1944) for the means and variances of these statistics and by Jenkins 
(1954) for the distributions. 

The integral equations may also be used to demonstrate the ultimate normality of the 
joint distribution when the sample size tends to infinity. By a different method, Hsu (1946) 
has examined this question for individual ratios. 


3. THE EXACT DENSITY FUNCTION 


From § 2, it is sufficient to consider the null case. Then 


N 
© (®;—%) (45-2) 
‘= = WN + (10) 
&X (%;—%)? 
i=1 





where xy; = x; and where 2,,2%,...,£,) are independent standard normal variables. The 
canonical form of (10) is 


(10°) 


Anderson (1942) has given the exact marginal distributions of the 7;. For our purposes, we 


27 
will require only to exhibit the distribution of the latent roots con 2 For any fixed j, 


2mj(N —k) _ 2njk 
cos —-"" = cos, 


the root, minus one, the least root. Thus the following formulation will be seen to include 
our problem. Suppose that 





so that the roots are equal in pairs with the possible exception of 


AwtAywyt.-- +AnWn 


1= Ae cbs _ eh 


WHEW, +... + Wy 


ial PW+ fy Wy, + re + fy Wy 








W+U,+...4+W, 77 (11) 
_ TWHTWy +... + Try 
1 WHEW Hee FW, 7) 


where w,, ...,w,, and w are independent gamma variables, the first n variables being gamma-! 
and the last gamma-P,+} where A,, ...,A,, are all greater than A, and 1, ..., #,, all greater than 


+ If N is even, p =}; if N is odd, p = 0; evidently p+n=}N. 


II-2 
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and 7,,...,7, all greater than 7. (The problem imposes more restrictions, to be stated later, 
on the coefficients of the w’s.) It is then sufficient to find the joint distribution of 








n 
Nw; ¥ 
Aw, | Sriw, t ai 
&=7,-A= ——=-, .., & =%-T=—-—y =-. 2 
pore w+iw, v’ a ee w+ilw, v 


The region of joint varintion of 8,, 89, ...,8, is the least convex polyhedron enclosing the 
points (0,0,...,0), (Aj, #4). -... 74), ---s (AZ, MH), --»T,), @ polyhedron in the positive quadrant 
of the g-space of 81, 89, ...,8,. It is convenient to drop the primes for the moment. 

The joint characteristic function of 1, m, ...,¢ and v is 


P(9;, Og, «+5 Aq, Og 41) = it (1—10,A;—i0gu,;— ... — 10,7; — 10444)! (1—t03,,)-”. (13) 
j=l 
The right-hand side of (13) admits a partial fraction expansion (Watson, 1951) so that we 
have, for q<n, 











¢ 
P94, «++ g41) = % >: >in Sadeodg ___ — » (14) 
a, My, ss+ Ty , = 
|Aj, Pig Tig 
where a Mig Mig Tae (15) 
“hr d2+++Iq 2 ri 
| + By; 73 | 
4 Ay, By, > Tl 
Ti. oe. 2 | 
od 


|1 A; Mig +++ Ty 


provided the A’s, y’s, ...,7’s are such that all the C’s are finite. The general term on the 
right-hand side of (14) may be recognized as the joint characteristic function of 


= Aj, Wy, +A; Wj + oe FAzW; 





Jq Iq’ 
Mm’ = fh; W;, + [j,Wj, + - +t fy, Wj 
(did se a el SO BUNS, ote (16) 
c= 74, Wj, + Tj, Wj, Foe t Tq Wjy 
peP ine ake ) 
where w’ is gamma n + p —q and is independent of w Whyy +225 Wy The joint density of these 


variables may be found ab initio to be (writing sign ae x = sgn2 so that |x| = xsgn2) 


| A;, woe Ty, | 
sgn : | [ti te wp... fe rr 
| Aj, eee Tiq } e~-v yntp—q-1 l Aj, yj, er Tj, (17) 


Pan ty pee oreo 


| 
| 
| 
1 A;, Mig + Tj. | 
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Thus, applying the Fourier inversion operator to (14), using (17) and integrating v out 
from 0 to 0, the joint density of 8), 85, ...,8, is found to be 





























i: db yt) Oe a : 
lt > ; 1 h | 
- Mn = sgn : 
: ; , ° : | Ay «+ T49| 
(n+p) _ . 5 Se Me et PA 
D(1 +P —Q) Gy, ...,59) € Solis ---580) Pe Me es Be 
| A;, see Tj, p } 1 r 
s II ji Ty, 
"| GAG de : 
Re twee Th : ‘ 
j jq | 
q ® | 1 A;, vee Tq | 
since the Jacobian of 1, m, ...,t, v to 81, 8, ...,: s,,v is vt. The terms in the sum (18) depend on 
the region of the density (17), ie. the term (jj, ...,j,) belongs to S,(s;, ...,8,), provided 
81, 8g, ...,8, are possible values of l’/v, ...,¢’/v as defined in (16). This means that s,, ..., 8, 
must fall in the least convex polyhedron with vertices (0,0, ...,0), (Aj,. Mj. +++ Tj,)s -++> 
(Aj,>Mjq> +++9T;,)- In this region 
cs 8, 
1 Aj, Mj, Ti, | 
; | 
1 Aig Mig Talo (19) 
| ee 0 0 
PY Ag my oo Me 
1 Ay, Mig Me | 


As Quenouille has pointed out the general term in the sum is associated with a hyperplane 
through the points (Aj,,/4;,, ....7),). «++ (Ajqs Mjg> +++» Tq) and is, in fact, proportional to the 
n+p—q-—1 power of the distance of (8,, 83, ...,8,) from this plane. 

Returning now to the original problem and its notation, we see that f(7r,. 79, .... r,) is 
given by 


2 ee ae 
} 1 Aj, fj, eee Tj, | ogn| Ai Min Th 
Mn +p) i. we. Mee Se i thes ge 
D(n+P—-QDu,..i0eR|1 A wo. TP ee es | 
| A Be soe, Me oe 1 Aj, Pj oe Ty | 
2 : | I#Guerdd| 2: : : | 
| 1 Aj, Ljq ous T 49 ' L 45, fig bi Tig 


where the set R is all (j,,...,j,) such that (r,,...,7,) is contained in the least convex poly- 
hedron with vertices (A, 1, ...,7), (Aj,sMjys «++ Tj) 00> (Aj, Migr T jg): In this region, the term in 
(20) raised to the power p is positive by (19). This is the required distribution. It agrees with 
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Quenouille’s distribution when p = 0; this is the case when NV is odd. When N is even, 
p = 4. Furthermore, when g = 1, it reduces to Anderson’s (1942) distribution for all NV. 

Quenouille has noted that there are several possible expansions for f(r, 72, ..., 7) When 
N is odd, i.e. when p = 0,A = 0, uw = 0,...,7 = 0. In this case the joint density of 11,79, ...,7, 
is given by 








iy, & — & Pe ' ' 
a we. My r | Aj, Ty, 
- " J sgn | : : 
I Xr. , ¥ | Xj, coe T4q 
['(n) 4 4 Gq P45, «oe Tq (21) 
P'(n—9) G,,...,49 €R 1 Ay My we 7% | , 


Il 1 Aj, My, vee Ty 


1 Aj, Mig ++ The 


with the summation rule: (j,, ...,j,)€R(r,, ...,7,) provided (r,, 72, ..., 7) falls in the convex 
closure of (0,0,...,0), (Aj, Mj +s Ty)s o> (Aj,> Mig? et Tj.) The domain of the variables 
1,1; «++» Tq 18, of course, the convex closure of (A;, ;, ...,7;) ( = 1, ...,n). Suppose now that 


the origin is moved so that we have new variables rf = r;+h,;(j = 1,...,q) with 
AF = A,z+hy, eeey Tt =7,+h,. 


The joint density of the r} is simply (21) with asterisks on the r’s, A’s, y’s, ..., and 7’s and 
the summation rule is found similarly. But this density can be rewritten as 

















l % ft). ws t% et 
> ’ | La . & 
2A, fy, « Th, | sen| : : 
 -. | , 
j j 
Mn) Eee. ee (22) 
P'(n—) G,.7..,59) € R* Ye A; Mj eee Fe] 
Il 1 Aj, My, Ty, 
G+#51, .os5e : : 
/ 1 Aj, Hig rn 


which is the same as (21) except for the sign term and the summation rule. But the values 
of the two densities are the same so that, by equating them, identities are obtained. More 
usefully, however, it may be observed that this is equivalent to using an arbitrary origin to 
obtain the sign term and the summation rule. Hence, to calculate (21), one may move the 
origin so that the point (7;,72,...,7,) is included in the least number of simplexes with 
vertices, the origin and (Aj,,/4j,, -...74,)> «++ (Aj, Mig +09 Typ) and so use the shortest possible 
summation. 


4, AN APPROXIMATE DENSITY FUNCTION 


In §3, the expression (21) gives the joint density of r,, 72, ...,7,, where 1, 7g, ..., 7, are defined 
by (10) in a sample of size N = 2n+1. The restriction is g<n or 2¢+1<N, whereas 
Quenouille’s argument requires 2¢+3< N. If, then, q is taken as n — 1, the density is seen 
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to be independent of the values of r,, 72, ...,7,,_1; in fact, it is a constant.t This result leads 
to an interesting method of approximating the true density function. 

Suppose the true polyhedral region of joint variation of 1,,7., ...,7,_, is replaced by the 
(convex) region in which the matrix 


1 " uP) Tn-1 
ry 1 ry e.- . 
Ria = 3 A (23) 
Tr-1 Tn-2 Tn-3 1 


is positive-definite. It may be verified that this region includes the true polyhedral region 
and that the vertices of the latter lie on its boundary. The variables r,_,, ...,7,,, may be 
integrated over this region, using a formula from Quenouille, 


\( = ) = Tern” | Ea) (24) 


to obtain the approximate density 











1 Te+4)/ |%,| \"*"* ‘ 
( ) (25) 


Keg Was 0005 Sak = n~ta 
f( a7 "2 a JL, T(s) |R. 
An equivalent method is to find the joint density of 7,715.9, ---)T1n.23---» It is easily seen 
to be proportional to 


(1 — Ins. 98...) (1 —Tin-2.23...)° iow. (26) 


If then 7,, 713.9, ---.7im.23,., are all assumed to have the range (— 1,1), they are seen to be 
independent and the density (25) is obtainable from (26). 

The density (25) applies, when VN = 2n +1, for serial correlations corrected for the mean. 
Quenouille’s conjectured density function, for arbitrary N and serial correlations un- 
corrected for the mean, is 





rte aoe) ( Le pe (27) 


s=1 I'(3N —s+3) Ry 


Expressions (25) and (27) do not differ by unity in the value of N as might have been 
expected. Expression (27) has been criticized by Watson (1951), and Jenkins (1954) has 
replaced it, for q = 2, by a new expression which gives correct moments up to order n. Far 
less is known about the case when mean corrections are made. However, Jenkins’s work 
for q = 2 leads us to the conclusion that (25) gives incorrect moments and is therefore 
unacceptable. 


+ It is of interest to note that this result may be deduced directly, by elementary methods, from 
the constancy of the density of the gaps, made by n—1 random points on the unit interval. 

{ Of course, comparison of (25) or (26) with Daniels’s results show explicitly where they are 
inadequate. His results apply only when q is O(1) with respect to N. Our results are exact when 
q = 4(N—1) and may still have some merit when q is O(N). 
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THE APPROXIMATE DISTRIBUTION OF SERIAL 
CORRELATION COEFFICIENTS 


By H. E. DANTELS 


Statistical Laboratory, University of Cambridge 


1. INTRODUCTION AND SUMMARY 


Hotelling’s suggestion of a ‘circular’ definition for the serial correlation coefficient was 
followed by considerable progress in the distribution theory of such modified statistics by 
R. L. Anderson (1942), Koopmans (1942), Dixon (1944), Madow (1945) and others. The 
exact distribution is known for the circular coefficient of any lag from an uncorrelated normal 
process and, more generally, from a circularly modified normal process of autoregressive 
type. Quenouille (1949) obtained by the same method the exact joint distribution of 
circular coefficients of different lags. 

The exact distributions are complicated, and a simple and accurate approximation to the 
distribution of the circular coefficient with known mean was found by Dixon (1944) and Rubin 
(1945) for the uncorrelated normal process. It was extended to the case of a circular Markov 
process by Leipnik (1947) following a method due to Madow (1945). The approximation 
depends on the device of smoothing summation over a discrete set of roots by an approxi- 
mating integration. Quenouille (1949) conjectured a similar approximate form for the joint 
distribution, but Watson (1951) and Jenkins (1954) showed that the conjectured form could 
not be correct. Jenkins developed the correct analogous approximation for the joint 
distribution of coefficients of lags 1 and 2 with known means. 

Without circular modifications the distributional theory is difficult and the field is largely 
unexplored. For testing independence T. W. Anderson (1948) gave an approximate table 
of significance points for non-circular lag 1 coefficients with known and fitted means. Watson 
& Durbin (1951) introduced modified non-circular definitions of the coefficients which have 
R. L. Anderson’s distribution in the uncorrelated case. The case of an unmodified auto- 
regressive process has not been much discussed, though a method due to Bartlett (1953, 
1954) is available for obtaining approximate confidence intervals for the parameters, and 
Quenouille’s (1947) approximate goodness of fit tests should be noted. 

In the present paper an approach based on the method of steepest descents is adopted to 
derive the known approximate distributions and to generalize them.* (For an account of 
the method see, for example, Jeffreys & Jeffreys (1956), Daniels (1954).) The analogue of 
Leipnik’s approximation is found for the distribution of an unmodified coefficient of lag 1, 
both with known and fitted mean, when the process is of unmodified Markov type. The 
approximate joint distribution of m successive partial serial correlations is found for an 
autoregressive process of the mth order, circular modifications being used. The work on the 
unmodified Markov process could be extended to the general case but we have not done this. 


* T have since learnt that Jenkins (1956) has independently obtained many of the results given here 
by consideration of the moments of smoothed distributions. Of our inevitable differences in notation, 
the one most likely to confuse is the difference in sign of the autoregression coefficients «;. My usage, 
which is that of Bartlett and Quenouille, is adopted for reasons of symmetry. 
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2. THE SADDLEPOINT APPROXIMATION 


In this section some general theory is developed which is fundamental to the rest of the work. 
We are interested in the distribution of statistics of the form r = c/c,, where cy is non- 
negative. If cy, c have a joint probability density f(c),c) the density for r is 


h(r) = i Co f (C9; Ty) Cy. (2-1) 
Let M(T, 7) = E eT ote 


be the joint moment-generating function for c, and c. We are concerned only with cases where 
M(T,, T) exists in strips of non-zero width containing the imaginary axes in the 7 and 
T planes. The usual Fourier inversion formula is most conveniently written as 


Howe) = 55 = | M(T,, T) e~To-Te dT dT, (2-2) 


the integration being taken along the imaginary axes of T, and 7’, or any allowable deforma- 
tions of these paths. In particular, 


F (C0, 79) = 


cane omni! M(T,, T) e~Pot od TAT 
m oni | | M(u—rT,T) e-“odudT. (2:3) 


where the integration of u = T,+r7 is taken over a similar path in the u plane. Inversion 
of the transform with respect to wu gives 


| ” FlGqo 10) @™0dey = =! I M(u—rT,T)aT, (2-4) 
0 271 


so that, when differentiation is permissible, 





y jim 1 (@M(u—rT,T) 
} 0 Cof (Co, 709) emedey = 271 | a aT 
oM(u—rT, ,T)| a 
and h(r) = — - i and (2:5) 


This is a form of Geary’s (1944) extension of Cramér’s theorem. 
However, we often want to transform 7’ to some other variable z. In (2-4) put 


T = T(z,u), (2-6) 


where 7'(z, 0) maps the 7’ plane on to some region in the z plane, and 07'/dz does not vanish 
anywhere on the contour except er at its termini. Proceeding as before, we find 
oT | | 
M(u-rT,T 
=| | au | (u—rT,T) =| 


integration being along the transformed contour in the z plane. This is the form used in the 


sequel, but the alternative form 
Heat) 2 20) ‘ 
mu) dz Belles” , lise 


h(r) 





dz, (2-7) 


u=0 





~ Oni 





Me) = Si 


is sometimes easier to work with in other applications. 











orma- 


(2-3) 
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(2-4) 


(2-6) 
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Our main task is the approximate evaluation of integrals of type (2-7) when the statistics 
Co, ¢ are calculated from a moderately large sample. In the cases considered it is found that 
the integrand can be written as ¢(z) [y(z)]", where v is the sample size, and the method of 
steepest descents can be applied. The contour is chosen to pass through a suitabie saddle- 
point 2 at which y’(2) = 0. It is taken to be a curve of steepest descent, upon which | ¥(z) | 
decreases most rapidly on either side of 2, and this is one of the branches of 


arg ¥(z) = arg (2) 
(see Jeffreys & Jeffreys, 1956). In all the applications considered here the appropriate 2 is 
found to be real, the corresponding contour of steepest descent through it intersects the real 
axis normally, and there is no other saddlepoint on the contour so that | y(z) | decreases 
steadily on either side of 2. As n becomes large the major contribution to the integral 
evidently arises from a small neighbourhood around 2. 
The usual procedure is now to expand the integrand in a particular manner about 2, and 
by termwise integration obtain an asymptotic expansion in powers of n~! whose dominant 
term is called the saddlepoint approximation. In that way we would find 


=| Wer de~ (ET ) 6(2) (WC) (2-9) 


2m 2anyr" (2 





the relative error committed being O(n-!). But in our applications y(z) is such that either 
the integration can be effected exactly at this stage, or at worst only ¢(z) need be expanded. 
The dominant term is still the saddlepoint approximation, except that (27n)-? is replaced 
by a slightly different constant. 

In discussing the joint distribution of the statistics r, = c,/c¢y(s = 1, 2, ...,m), we make use 
of an extension of (2-7) established in exactly the same way. Let 


M(T, T’, eoey T,,) = FF eTo%otT1%+.--+T mem, 


Write wu = T+7r, 7, +17. Ty+...+1%_T, and let 





m~m 
Ti, = F(Z, Ze) «-+>%m,¥) (8 = 1,2,...,m), 
be a suitable transformation. Then the joint probability density of 1), 79, ..., 7, i8 
om OT, 0095 Bead 
R(r,, .++5Tmn) Coe oa a 2-10 
( ’ m “ar aap sam O(2,-- wii Ba) u=0 P 5 ( 








The integral can a be approximately evaluated by hi contours of steepest descent 
in the planes of z,,...,2,, passing through saddlepoints, 2;, ...,2 .. The details of the work 
are most easily studiod | in the actual application of §§ 9-11. 


3. CrrcuLAR MARKOV PROCESS. KNOWN MEAN 


We first discuss the distribution of the circularly defined lag 1 serial correlation coefficient 
when the process is of circular Markov type with normal residuals and zero means. Let 
%1,X,...,%, be such that 

L,—PXo-1 = €,, =X, (8 = 1,2,...,2), 


where €;, ...,€, are independent N(0, 1) variables. The joint distribution of the 2’s is 


dF = = aie exp HK 1 +p?) (w?+...+27) 


— 2p(2%otLo%yt...+%p__1%n+%X,X;)}]da,...dx,. (3-1) 
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The sample estimate of p is taken to be r = c/¢y, where 

Co = Hit... 425, C2 Xgt... $¥q_1%_ + F—%)- 
The moment-generating function for cp, ¢ is 


M(Ty,T) = Bette = (1—p)|A|-, 








where 
fl1+p?—27T, —(p+T) 0 0 een 0 0 —(p+T)- 
—(p+T) 1+p?—-2T, -(p+T) 0 . 0 0 0 
pe 0 —(p+T) 1+p?—2T% —(p+T) ... 0 0 0 
0 0 0 0 .- —(p+T) 1+p?-2%, —(p+T) 
—(p+T) 0 0 0 a 0 —(p+T) 1+ p?-27, | 
valuating its determinant as a circulant we find 
(l1—2")? 1 1+p?-2T ’ 
= n teas z = ie 3-5 
|\&| = +7 ———., 2+ a ae (3-2) 
and so, writing = 7 +97, 
2\hn 
Ste 9, Pye BO). Dame. (3-3) 


(1—2z") (1—2pr+p?—2u)t”’ 


2(1—2pr+p*—2u) (3-4) 


where pt+T= (1-272 +24) 





The results of § 2 can now be used. Taking 7’ = T(z, u) of (2-6) to be defined by (3-4) we 
have 2) - (1 —2*) (1 — 2pr + p? — 2u) 
Laz) (1 — 2rz+2?)? 3 


Gz ae ea a ai 








and (2-7) becomes 








iy) = -O—D-P9 poate z, (3-5) 
mi(1 — 2pr-+ p®)P (i—2") 
with 47.8 (3-6) 


(1—2rz+2?) ° 
On examining the successive transformations 


7 1 — 2pr + p* 
ogee Gore 2p+T) ’ 
which together form (3-6), it will be seen that the region | z| <1 is mapped on to the whole 
T plane cut along the parts of the real axis exterior to the interval 


{—(1+p)?/2(1+r), (1—p)?/2(1—r)}. 


Any path in the 7 plane running from 7 —ioo through the gap in the real axis to 7’ +i# 
corresponds to a path in |z| <1 running from e~” to e, where r = cos@. The path of 
integration for (3-5) is therefore of this form. 

















(3-5) 


(3-6) 


vhole 
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Consider the factor (1 — 2rz +z*)3"-2 = {1 —r? — (z—r)}4"-2 in the integrand. It has a real 
saddlepoint at 2 =r, through which the path of steepest descent is the branch of 
arg (1 —2rz+2*) = Ocrossing the real axis orthogonally. This is just the straight line joining 
points e~”, e, and we choose it to be the path of integration for (3-5). 

So far (3-5) is exact, but if we are prepared to tolerate errors which are ‘exponentially 
small’, i.e. of magnitude O(A”) for some | A | < 1, the factor 1/(1 —z”) may be ignored provided 
neither r nor p is near + 1. This assertion is discussed in the next section; assuming it to be 
true and putting z = r+ iw(1—r?)t (—1<w<1), we find that, ignoring p”, 

h(r) ~ (n—2)(1— it ss : [ (1 + w?) (1 — w?)2”-2 dw 
2n(1 — 2pr + p?)i" J -1 
T(gnt+1)  (1—r?)h—p 
~ mT (Ant 4) (1—2pr+ py’ 








(3-7) 


which is Leipnik’s approximation. The smoothing procedure is thus equivalent to ignoring 
the factor 1/(1—2"). If this factor is expanded in powers of z” we obtain on integration the 
series 





P(3n +1) ( 1—p” Mls ar 1) 3  selalasbed 
MO = TAL —2pr+phin| Tan+h) 2 TEnh) rat 
5 q2” 


ee _ —__ (] — ¢2)\hGn—D _ . 
+ >2nT(En +4) qa r ) re Ie (3 8) 


which confirms Dixon’s observation that when p = 0 the moments of Leipnik’s approxima- 


tion about the origin agree with those of the exact distribution for all orders up to n, since 
the contributions from terms after the first reduce to zero on partial integration. 


4, THE ERROR OF THE APPROXIMATION 


Here we determine an upper bound to the error incurred by the approximation. The effect 
of ignoring the factor 1/(1—2z”) is to alter the integral in (3-5) by an amount 


i fre (1 —”)(1- - 2rz+22)in-? 
(1—z") 
where z = r+iw(1—r?)t (—1<w< 1). Since 
| z|? = 72+ w(1—r?), 
| 1—22| = {((l—r®) [L—r + w%(14+7)] [1 +r+w(1—r)} <2(01—r2)t, | 1—2"|>1-|z]", 
1 2 2( 1 — 2)]d0 
we have |d|<4(1 = rye | fe Rath: (1 —w?)d"-2 dw. 
o 1 —[r? + w®(1 —r2)} 
The first factor in the integrand keeps it small except near w = + 1, where the second factor 
1 k fl 
takes over since 1 —[r?+w2(1—r?)]}}" ~ 4n(1 —r?) (1—w?) only. Let [ = } +{ , where 
0<k<1. Then oll dake x: 


k 2 1 [2/1 —_ »2\]40 k 
| “ [7 +k (1-7 )] | (1 — wyin-2 dy 
0 1l—[r?+hk(1—r? ye" 0 


_ [r® +21 =r?) i . " 
= 1 peed — ayia BBG 1 DU — Fld — 1, 2} 
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in the usual notation for complete and incomplete Beta functions. nk<w<1, 


1 —[r? + w%(1 — 12)” > {1 —[r? + k2(1 —r?)]} era 


since it is a concave function of w?, so that 
1 _ pe 1 

| fea (1—?) | (1 — w?)i"-3 dw 
kt 1—[r?+k2(1—1?)}in Je 

a (1 —k?) i: 

~ 1 =[r2 +21 — 12)" 








$B(gn — 1, 3) L_pa(dn — 2, 9). 


Hence , 
2(1 —r?)2"—-1 ‘ " 
131 < a in PD {er + k?(1 —1°)}#" [1 — L_ya(4n— 1, 9)] 
(n—3) 


+ Gay ME aaldn —2, 99}. 





For each value of r there is a best value of k which has to be determined numerically. This 
gives an upper bound for | 6| which when multiplied by (n — 2)/27(1— 2pr +p?) gives an 
upper bound A(r) to the absolute error in Leipnik’s approximation. (More strictly the 
factor 1 —p” should also be included when p” is not negligible.) 


Table 1. Leipnik’s approximate density h(r). Error < A(r). n = 20 


























| p=0 p=05 
| + hir) | A(r) | r | hir) | A(r) r h(r) A(r) 
Aah ies cone | ae 
| 
| 0 | 1-8066 | 0-0050 ney ean Gre | 0-4 1-7511 0-0121 
| O1 | 16421 | 0-0048 -09 | — “we 0-5 20860 | 0-0254 
| 0-2 | 1/2258 0-0042 -08 | — — | 
| 08 | 07375 | 00034 | -07 | — es 0-55 | 20870 | 0-0358 
0-4 | 0-3448 | 0-0024 | -o6 | 00001 | — 0-60 | 1-:9339 | 0-0490 
| 05 | 01175 | 0-0014 -05 | 0-0004 ~ 0-65 | 1-622] 0-0640 
| 0-6 | 0-0260 | 00007 —0-4 | 0-0023 he 0-70 | 1-1889 | 0-0776 
0-7 | 0-0030  0-0002 | -0-3 | 0-0092 — | 0-75 | 0-7185 | 0-0832 
| 0-8 | 0-0001 — | -02 | 00298 | 0-0001 0-80 | 03233 | 0-0725 
| 0-9 — — | ~o1 | 00817 | 0-0002 | 0-85 | 0-0884 | 0-0431 
|} 10 | — — | © | 01940 | 00005 | 0:90 | 0-0092 | 00118 
| OL | 0-4059 | 0-0012 | 0-95 | 0-0001 0-0039 
| 0-2 | 07526 | 00026 | 100 | — — 
| | | | 03 | 1-2317 | 0-0056 | | 





Calculations for the case n = 20, p = 0 and 0-5, are shown in Table 1. When p = 0 the 
error is seen to be quite negligible in the tails of the distribution, but when p = 0-5 there is 
a possibility that the upper tail may be materially affected at, say, the 1°% level, since A(r) 
is in that region comparable in magnitude with the ordinates. The error may of course be 
less than A(r) but cannot be ignored without a more thorough investigation. 
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5. CracuLaR Markov PROCESS. UNKNOWN MEAN 
When the true mean is unknown a suitable estimate of p is r = C/Cy, where 
Co = (41 —%)? + (172 -%)? +... + (%_ — 2%)? = Cy — 02, ! (51) 
= (4, —%) (v.—%) +... + (€p,_1—%) (4, —%) + (2%, —Z) (4, —%) = c— nz, 


and % = (%,+%,+...+x,)/n. The moment-generating function for Cp, C is 


M(T,,T) = Beto? = (1—pn)|A+? (Ty +7)W\ 


> 





with A as previously defined, and I’ = [1,1,..., 1] so that Il’ is the n x n matrix with unit 
elements. Since every row of A sums to (1 —p)? — 2(7) + 7’) the determinant can be written as 























2 +20 ‘ _2  (%+T) 
A+s (+P) = [Al[t+al’|, A=2 4a ema ay 
The second factor is 
Baa Y 1 eS _ | lena] PF] _ 
‘ear * ain Te oe 
Pea. (1—p)? (p+ TP *(1-—pf 
Lh 2— l a rs 
so that As Q+ TM (i—p)?— +7) | | = at =e! zm)2, 





and hence M(u-rT,T) = S-30-2 © — dra atte ® 
; (1—2") (1—p) (1—2pr+p2—2u)ke—’ 


with z defined by (3-6). Proceeding as in § 2 we find 
(x —3)(1—p”) fare 


= 27i(} —p)(1—2pr + pyr) (1-2) 


(5-3) 








dz, (5:3) 


with z = r+iw(1—1?)? (-l<w<l). 
Ignoring the factor 1/(1—z”) again introduces an exponentially small error. Neglecting 

p” and evaluating the integral we find 
T(4n—$) (1-9) 


went ant T\ “Sys |- 1—p)(1 —2pr +p? — —1) 





(a-n-Zatn ; (5:4) 


As is evident from an expansion of the type (3-8), the moments of r when p = 0 are exact 
for all orders up to n. The fact that (5-4) can be negative when 1 —r ~ O(n") is not in itself 
important since approximations of this type in any case break down near r = 1. A modified 
form of the approximation (equation (7-6)) is discussed in §7. 


6. NON-CIRCULAR MARKOV PROCESS. KNOWN MEAN 


In testing for independence the use of a circularly defined sample serial correlation coefficient 
may be objectionable on grounds of loss of power and sensitivity to extraneous trends. But 
when p+0 the circular definition of the process itself is artificial and can only be 
justified if the results arrived at by its use are not substantially affected by the assumption 
of circularity. We now examine the approximate distribution of the sample serial correla- 
tion coefficient without introducing circularity assumptions. 
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The process is defined by 
Uz — PXs_1 = €s (6-1) 


for all s, the e’s being independent N(0,1) variables as before. The joint distribution of 
X1,Xq, ...,2, 18 NOW 





dF = (I = PY expl- HHak+ (1+p?)(a3+...+02_,) +22 —2p(a 74+... +%,_12,)}]dx,...dx,. 


(27) 
(6-2) 
The sample estimate of p is taken to be r = c/¢y, where 
c= oe ia tae (6-3) 
Cy = fat +a34+...+07_, +422. 


We have chosen the intra-class correlation coefficient between the series 2, 2, ...,%,,_4 
and 2, %3,...,%,. Apart from its intuitive appeal it seems to lead to the simplest analysis 
both here and in the next section where the mean is fitted. We now have 


M(T, T) = EeTo%+Te = (1—p*)t| B|-4, 








where 
r 1-G —(p+T) 0 0 oe 0 0 o 4 
—(p+T) 1+p?-2T, -(p+T) 0 a? 0 0 0 
~— 0 —(p+T) 1+p?-2T, -(p+T) ... 0 0 0 
0 0 0 0 ..» —(pt+T) 1+p?-—2T%, —(p+T) 
L 0 0 0 0 ids 0 -(p+T) 1-4 ] 


Then | B| = @, —2(p?-T) G,_, +(e? -—T)?@,,_2, where G,, is the determinant of a matrix 
similar to B except that all its diagonal elements are 1 + p?--27). Also 


G, _ (1 +p? a 27) Gy-1 =. (p + T)? Gos 
with G, = 1, G@, = 1+p?—27%. In this way it is found that 


(p+T)" | (p?— P| - ee | 
B| = —— i; 1— nle— 6-4 
Bl = xa —all!~ wen) | ~~ Gen] i 
with z as before. It can again be shown that if 1 —r? is not small, omission of the term in z" 


within the bracket incurs an exponentially small error in the final approximation. For 
brevity we omit it at this stage and write 


|B|~ (e+Ty" ! = 3]. 


2(1 —2z?) “- 


(p+T) — 








whence, after some reduction, 
M(u-—rT,T)~ (1 — 2rz + 22)" (1— =p" (1-2) (6-5) 
"(1 =2pr-+p?—2u)i [(1 — pz) (1—pr—(r—p)2z)—u(1—2)] 


with z given by (3-4). Using (2-7) we get the result 





n(1 —p?)t _ 
hir)~ ami(1 — 2pr + pin a (z) (1 — 2rz + 22)8"-2 dz, 











(6-1) 


ion of 


dx 
(6-2) 


n* 


(6-3) 


i Ly —1 
alysis 





(6-4) 


1 in 2” 


. For 
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where (1—2?)8 a (1—2*) (1—2pr+p?) _ | 
z)=— 1+ med —4 
#@) = =a) (1—pr—@—p)O\" * m | —p2) (1 —pr——p)2) 
and z = r+iw(1—r?)t (-1<w<l). 
The integral cannot be readily evaluated in closed form. We therefore expand ¢(z) as 
a power series in z—+* and integrate. The odd terms vanish and we get the series 








: f(z) (1 — 2rz +22)8"-2 dz 


ye at n a. ’ (1-9)? pi¥(r) 
ei iT Qn—y)" ~ (sr Sqn -™)P')+ etm —1)mti) 

(== ger) 
28s!(n—1)(n+1)...(n+2s—3) 
Since ¢(r) = (1 —r?)#/(1 —pr) {1 +O(n-)} a first approximation to h(r) is 





+..0+ 





(6-6) 


a pete = am 
Mr)~ 2 {Jade 2 ee et 1 vi 


7. ACCURACY OF THE APPROXIMATION. RENORMALIZATION 


The remainder in (6-7) is relatively O(n-'), though there has also been neglected an 
exponentially small term which in samples of moderate size (say, n about 15) may well be 
comparable in magnitude. To study the accuracy of the approximation adequately would 
require calculation of an upper bound to the error as in § 4. We do not attempt this here, but 
instead merely assume that n is large enough for the exponentially small term to be negligible 
and content ourselves with a heuristic discussion of orders of magnitude. While admittediy 
not entirely satisfactory it does give some insight into the magnitude of the errors involved, 
and suggests a device for improving the accuracy without much extra trouble. 

Observe first that the variance of r is O(n-') so that values of h(r) outside some range on 
either side of p which is O(n-4) are negligibly small. Thus r—p can be regarded as O(n-+) 
over the effective range of r. 

Consider the first neglected term in the expansion, the one which is O(n-!). If r is replaced 
by p the term is thereby altered by an amount O(n-3), but it is now independent of r and in 
fact becomes part of the normalizing constant. One may therefore legitimately write 


(=p) (1-yhea 
(1—pr) (1—2pr + p2)bn 


over the effective range of r, where K is an adjusted normalizing constant.* Moreover 


hr) ~ K - {1+O(n-3)} (7-1) 








Rew. Ws era : 
arcane (7-2) 
(l—p?)F(1—r?)t | (r—p)? a 
wr (1—pr) m3 2(1— = pape t OL p)?) 
l—r2  )1420-p*) 
* 1—2pr al O[(r —p)?). (7-3) 


* The device of renormalizing a saddlepoint approximation was used by Cox (1948) without con- 
sideration of the magnitude of the remainder. 


12 Biom. 43 
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Consequently, to the same order, h(r) may be written in Leipnik’s form as 


TaN+1)  (1—r2)kv-» 





mi -4 ’ 
bi mT ($N + 4) (1 —2pr +p Sli va 
2 
with N=n-1 oes (7-5) 


Actually when p = 0 the relative magnitude of the remainder reduces to O(n-*), since r? is 
now O(n-') and the coefficients of successive terms in the expansion of the integral become 
functions of r? only. The distribution when p = 0 is approximately the same as that of an 
intra-class correlation coefficient with known mean from a sample of n pairs. (The inter- 
class coefficient has one fewer degree of freedom.) But when p + 0 the distributions are not 
the same to the order considered. 

The same device can be used to simplify the approximation (5-4) for the circular coefficient 
with fitted mean. If errors of magnitude O(n-!) are tolerable the term (1+71)/n in the 
bracket may be ignored and the distribution renormalized to give 

bie T(an+3) — (L=r) (1-1?) a sterte ais 
2nAT ($n) [n(1 —p) — (1 + p)] (1 — 2pr + p? her» 

So to O(n-) the distribution of r with fitted mean is the same as the distribution found by 
Jenkins (1954) for the first partial coefficient with known mean. 








8. NON-CIRCULAR MARKOV PROCESS. UNKNOWN MEAN 
When the mean is unknown we estimate it by % = (47, +2.+...+%,_,+42,)/(n—1) and 
estimate p by r = C/C,, where 
C = (a, —2) (x,—%)+...+(%,_1—%) (vw, —%) = c—(n— 1) 2, 
with c, ¢, defined as in § 6. Then | 
M(Ty,T) = (1—p*)8|B+70+2) mm’, 
where B is the matrix of §6, and m’ = [4,1,1,..., 1,4]. The determinant can be evaluated 
as in §5, with some extra manipulation to allow for the fact that the sum of the first or last 
row of B is not exactly half the sum of any other row. We omit the somewhat lengthy details 
and record the result that : , , 
mam — 72 sel “a 2\4(n—1) 
Miu—rP,7)~—__(=e R= 292) (1 Bree 
(1—p) (1-2) [1 —pr—(r—p)2z—u(1 —2*)] (1 — 2or + p?— 2u)kr—) 
ely (+e)? (1 = 2r2 +24) [(@—p) (1=p2) (1-7) + ul -2*)] % 
(n— 1) (1 —z?) (1—2pr+p?—2u) [1 —pr—(r—p)z—u(1—2z?)]} 
where a term in z”~! has been ignored for the reasons stated, and z is given by (3-6). 
If we are content with an approximation having remainder relatively O(a-#) the last 
factor may be ignored, the dominant term being taken as before in the expansion of the 
integral for h(r) and subsequently renormalized. Ultimately it is found that 


K(1—p?)t (1—12)ke-9 (1 —r) 
h(r) ~ 
(1—pr) (1—2pr + p2)in—3) 











{1+0(n-4)}, 
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where K is a normalizing constant. Following the method of §7 we can replace it to the 
same order of accuracy by 
T(3N +3) (l—r)(1—r?)t—1 


ee. i. 4 
Kl anh T( ($4) [N(1—p)—(1+ )] (l—2pr+p? nian +O(n-*)} 





with N = n—1+ ?/(1—p?). 

The effect of employing non-circular definitions is therefore to replace n by N in the 
approximate distribution, to O(n-*), whether the mean is known or fitted. But in the latter 
case, even when p = 0 the approximate distribution of r is not the same as that of the intra- 
class correlation coefficient with fitted mean. 


9. CIRCULAR AUTOREGRESSIVE PROCESS OF ORDER m. KNOWN MEAN 


The discussion is now extended to cover the mth order autoregressive process. For simplicity 

only the circularly defined process and statistics are considered. The non-circular case could 

be dealt with by the methods already used, but the labour involved would be considerable. 
The process is defined to be 


Ty + Hy Ly + Ag%y ot... + AmLem = Cy (8 =1,2,...,0), Le=Anys (9-1) 


where m is O(1) with respect to and the e’s are independent N(0, 1) variables. It may be 
concisely written as 


Ax = €, 
where A is a circulant with first row (1, 0,0, ...,0,%,,,@m—1, «++» %, 1). Its determinant has 
the value 
Il m 
[Al = I (of? +a,o77+...4+0,,07-") = T] (1-07), (9-2) 
pus tnd 


where w; = e?74i/n and 0,, ...,,, are the roots of 0" + a,0"1+...+a,, = 0. 
The joint distribution of the 2’s is then 


n 
HI (1-6) eee iM dix;. (9-3) 
“gail Pate t Fn 
Note that A’A is a circulant with first row 

(L+ar+ak+ ... 400%), Oy +4 gt Ogg t ... +A 1 hin, ly + Hy Og + Ogg t+... + Am ohms -++5 


Xm—1 ss A Xm, ms 0, esi 0, Am» m—1 a Oy ay go tose ay + Oy Xe Feet Xm- 1%»): 


We consider the joint distribution of the m circularly defined coefficients r, = ¢,/¢y 
(s = 1,2,...,m), where 


¢, = Uy %o44 + U_%o19 Feet Ly—sUy + Tr—s4i%1 Feet Uy Xs. 
The moment-generating function for the c’s is 


M(f,, T,, ...,T,) 


m) = E eFoeot Tier t+Tmem 


= T[ (1-97) | A’A-0 |-4, (9-4) 
t=1 
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© being a circulant with first row (27'y, 7, Th, ..., Tn, 9, ---5 9, Dns Tn_1> «++» Ty). When m = 1, 
A’A — 9 is just A of §3 with a, = —p. 

Let us introduce new variables P, a,, dp, .. 
following way: 


., 4, telated to the elements of A’A—@ in the 


L+ar+o3+...+a2,—27, = P(l+a?+a3+...+a3,), 
hy + Oy Ug + Ugg t 0. +O 1 hm — Ty = Play +aydgt+...+An_14n); 
Oy + Oy Ag + Ag hy t... + Ay hm — Ty = P(g +4134... +An_2%m); 


‘ (9-5) 


Xm—1 + hy hy — - as P(Qy-1 + 14»), 
an — ol =. Pay, / 





Then |A’A—0| = P* Ty (1-97), 
t=1 


where ¢,,...,¢,, are the roots of 6”+a,¢"-1+...+a,, = 0. In the special case m = 1 this 
reduces to P"[1—(—a,)”"] with 


1 1+aj—27, 
P = (a,—T,)/a,, at, = rm 0 


> 


which is identical with (3-2) if p =—a,, z =—a,, T =T7,. The present method is thus 
a natural extension of the one used for the Markov process. 
Let wu = T%]+7,7,+72T,+...+7,,T,, and write 


mm 


Q(a,r) = (L+aq+... +a%,) + 21 (dy + yy +... +O 1m) + 002 + 2 Om 


, 1 ’ ’ 
_ {l,a IR, = 1+2a r+a R,,-14, 




















where a’ = [G,, Gq, ..., Bn), 2" = [1 4, Fg, --+y Meg] 
and rl ry Ys ]_ ws #9 
" 1 ry uP) 'm-1 1 r’ 
Rn=|r, 4 1 ym ws tea & ince | 
Ltn Tm-1 "m-2 m-3 14 
From (9-5) we find P= ong a (9-6) 
and (9-4) becomes 
dn — nr 
M(a—1,T,— ...—taT,, T, ..., 7) = — $ —— a 4 ez 1 (9-7) 
[Q(a, r) —2u}i” e=1 (1 — $f) 


The inversion formula (2-10) can now be used with a, in the role of z;. The successive 
transformations 


Th, Ty, «+09 Tq > P, Gy Ggy « «+5 By > Uy Ay, Dg, 205 Bm 








(9-5) 


| this 


thus 


(9:6) 


(9-7) 


SSiVe 
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have Jacobians 





























4(1+a?+...+a?,) ay Co oreo eds é., 
Ay +OyAgt ...+Qn 1A, 14+, Ay+g ... An gtQ@n Ami 
0(T, T, .--» Tm) = (—)m4 pm Ag+ QyAgt...+Qn o%n Gy L4+dy ... Gng Ome 
MMs scaled c+. h qutusawegupaienentneshisebntiongssuxes) dageeeroesrebasseeteadanreraamibas 
An—-1 +44» An 0 1 a, 
Am 0 0 1 
= }(—)”™* PJ (a), 
where 1 ay Ay ae Bn-1 Om 
Q, %I144, Q+Qy ... AnotQn Up-1 
J(a)=| % a, lta ... Ons Ons 
is Be 0 1 a, 
Ain 0 0 os 0 1 
O(P, a,,...,@ oP 2 
= er ial 
and from (2-10) the joint probability density for 7,, ...,7,,, is 
he mn 
_ ae Nes = | a cee te = (eh 
— 1-9") 


t 


The paths of integration in the planes of a,, ...,a,, are chosen to be the lines of steepest 
descent of Q(a, r) passing through the saddlepoints @,, ...,d@,,, which satisfy 


oQ/da, = 0 (j= 1,2,...,m), 
ie. r+R,,,4=0. (9-10) 
The saddlepoints are thus the usual least-squares estimators of the «;’s. Also 
Q(a,r) = 1+r'4 = 1-r’R,),r = |R,, |/| R,-1|- 
Since Q(4, r) is real, Q(a, r) must remain real on the lines of steepest descent, i.e. if a=§ + in 
they must be such that, $Q(a,r) = 2n'(r+R,,_,€) = 0. 
The straight lines £; = @; satisfy this condition. On them, a; = @;+%y; and 


R , 
Q(a, r) = te “ | me | R,,-10, (9-11) 





which is a decreasing function of each |4;|. They are therefore the required paths of 
integration. 
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Since Q(a,r) = 0 at the end-points of the paths (corresponding to 4(T7;) = +00 in the 
T; planes), the domain of integration is ’'R,,_,<|R,,|/| R,,_1|- 

We shall again ignore the factors 1 — 6}, 1—¢/ since it is not difficult to establish that 
except in critical cases | 0,| <1, | ¢,| <1 for allt. (This is equivalent as before to a smoothing 
operation.) The factor J(a) is a polyncinial of degree m + 1 in the 9,’s, and if retained would 
lead to an approximation to A(r) with 21i exponentially small remainder. But for simplicity 
we shall replace it by the constant term J(4). The resulting approximation to A(r), when 
renormalized, will be in error i a factor 1 +O(n-#). We then find 


1 T'(4n) Sf. { | R,, | , -_ 
hir)~ — Samed —7'R,,_ dy,...d9,- (9°12 

(r)~ (2a) T'(4n —m) am (A,r) [R,,-1| NRn-1N 1 -+- 4m (9°12) 
The integral is readily transformed to the Dirichlet form and has the value 


rien ar [Ry = 





The factor J(&) can be reduced to some extent as follows. Write it in partitioned form as 











Leo 
J@)=layp 
where D is a m x m matrix having elements dj, = @,_; + @,,;, if we define @ = 1 and@,,,; = 0 
whenever j > m. Also (9-10), which is 
v5 * p> —k| % = 0 
k=1 
™m 
can be rearranged as a;+ Y dyr, = 0 
k=1 
or 4+D’r = 0. (9-13) 
R,,, 
Hence (a) = (1+2'@)|D| = 1 aL |p, 
and the approximation to h(r) is 
4(n—m) 
hr) ~ K | Rn| ka 5f +0(n-2)}. (9-14) 


| R,, _, [be —m+1) Qt" (a, 


Since d,,; = 0 (7+mM),dmm = 1, |D| reduces to an . —1)th order determinant whose 
elements are functions of the @,’s. As will appear in the next section there is no need to express 
it in terms of the r,’s. 


10. THE JOINT DISTRIBUTION OF THE LEADING PARTIAL SERIAL 
CORRELATION COEFFICIENTS 
The distribution simplifies considerably when expressed in terms of partial correlations. We 
use the convenient notation r;. to denote the partial correlation coefficient between x, and 
“,,; conditional on fixed 2,,,,...,%,,;1, Which may be called the jth leading partial serial 
correlation coefficient. It is a standard result that 


| Ry | = (1 —r9ym (1 — 79-8 (1-13... (1-79...) (10-1) 
and also that @,, = —1,,.. 
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The first step is to make a transformation from the r,’s to the @;’s. From (9-10) and (9-13) 
we have 0=r+R,,_,4=4+D'r = q say, so that 





_ 0g , oq Or _ ,or 
= oa tar oa Rn at Diag 
; ria or|  =—s- | Rn-a| 
and the required Jacobian is | ~ ]D] (10-2) 








Next we transform from the @,’s to the r;.’s. Let us introduce the notation @;=4@, ,,, to 
exhibit explicitly the fact that a process of order m is being considered. In this notation 
m 
a, ; =—1;. and (9-10) is r;+ ¥ 7,_;)@;,m = 0. Subtracting the corresponding equation for 
i=l 
order m — 1 we obtain 
m—1 


A A A 
x N\i-j (i,m —a; we + Tm—-j*m,m = 0. 
aa 


m—1 


Comparison with >) 114-418m—tm—-1 + mj = 0 
i= 


a 
. A A A A 
gives asm = Qs, m—1 + Gm—i,m—1%m,m: (10-3) 


Keeping @,, ,, unaltered, we can use (10-3) to transform the remaining variables 


A A A 
ay, m?°**> Am—2,m> Gm-1, m 
A A A A 
to Fy, m—12 %2, m—1> «++» Um—2,m—19 Um—1, m-1- 
“ae . . A . 
Repetition of the procedure ultimately reduces the variables to @; ; = —r;. as required. 


The Jacobian of (10-3) is the (m—1)th order determinant 


, (Se 0 @ 


| 0 bins. | 
| o 5 -« 0 Gum 0 | 
0 - a 0 0 | 
Hy, = ssa Sierkinecents. eee ditoaltes ) 
@ aun a) 
°= 0 0 0 


which has a central element 1+4,, ,, when m is even. Its value is 
H, =(1—7%,.)#”, m odd, 
, =(1-13,.) | said 
= (1—r,,.)(1—1%,.)#"-1,_ meven, 
and the Jacobian for the ultimate reduction to the required variables is H,, H,,_, ... H, Hp. 
Combining this with (10-2) and using (10-1) we finally obtain from (9-14) the approximate 
joint probability density of the r;.’s: 


K 
ia i Ta - 3 TI (=r)(1 — 1? jr-1 11 + O(n-¥)}, (10-5) 


K being a normalizing constant. It is apparently not possible to decompose Q(a, r) into 
factors depending on the successive partial correlations. 
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The most important application of the distribution (provided the circularity assumption 
can be tolerated) is in testing the hypothesis that «,, = 0 by means of the statistic r,,.. When 
a, = 0, Q(a, r) does not contain r,,, so if m is odd r,,. has density proportional to (1 — 72, .)#"—1 
to O(n-*), the same as the null distribution of r,, while if m is even its density is proportional 
to (l1—r,,.) (1 —72,.)3"-1 to O(n-), agreeing with that of r, found by Jenkins (1954). 

It will be remembered that if J(a) had not been replaced by J(&) in (9-9) the remainder 
would have been exponentially small. When m = 2, J(A) is a cubic in a —4 but the odd terms 
vanish on integration and there is just one extra term. Calculation shows that the term can 
be absorbed into the normalizing constant so that the approximation has again an 
exponentially small remainder. But when m > 3 this is no longer the case. 


11. CrRCULAR AUTOREGRESSIVE PROCESS OF ORDER m. UNKNOWN MEAN 


Finally, the effect of fitting the mean is considered in the manner of § 5. We require the joint 
distribution of r, = C,/Cy(s = 1,2,...,m), where C,=c,—nz*?. The moment-generating 
function is 

m 


2 4 
M(T,,T,,...,T,,) = Th(1—67) N'A-O+ (Ty+T,+...+ 7) | ; (11-1) 


t=1 
Each row sum of A’A—@ is 





(l+a,+...+a,,)?—2(%)+7,+...4+7,,) = P(l+a,+...+4,,)? 


ni 


from (9-5), and as before the determinant reduces to 


(l+a,+...+@,,)% ) a, : 
Mma i 





so that 
M(u—7,T, —...—%m TMs Ty, «++ Tn) 


i (1+ ay + vet A») Qir-V(a, r) — ~ (1 Re Of) 
(L+ca,+...+%n) [Q(a, r) —2u]te—Y y= (1-97?) 


The effect on A(r) of having fitted the mean is thus to reduce n to n—1 in (9-12), and to 
introduce the extra factor 1 +a,+...+4a,, into the integrand, with a suitable readjustment 
of the normalizing constant. For an approximation with error O(n-), the factor can again 
be replaced by 1+@,+...+@,, and taken outside the integral. 


We have therefore only to evaluate 1+@,+...+@,, in terms of the r;.’s (now with fitted 
means). From (10:3) we have 











(11-3) 


1 +81 m+ = +@nm _ (1 +@y m1 + oe. +@n_1,m—1) (1 +@n,m) 


=...= (1 +@m, m) (1 +8q1 mn-1) ee +@, 3) 
or, in the present notation, 


1+@,+...+@,, = (1—1,)(1—19.) ... (L—Tim-)- (11-4) 
Hence the approximate density for the r;.’s with fitted means is, from (10-5), 
K 
eeceeke l—r,.) (1—7? jhe-9 l—r,.)2(1—72, 8-9 1 + O(n-3)y}, 11-5 
QhO—D(a, r)j II J )( 7 ) jl j ) ( j ) { ( )} ( ) 


where K is a new normalizing constant. 
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Thus in testing «,, = 0 when the mean is unknown, r,,. can be taken as approximately 
distributed with density proportional] to 


(1—r,,.)(1—72,2¢-9 — when m is odd, 
and (1—r,,.)2(1—7%,.)#-® when m is even. 


Jenkins (1954) gives the values E(r, )~ —2/(n—1) and E(r3_)~ 1/(n—2) for the first two 
moments of r,. with fitted mean, while Dixon gives the results 


B(1—73.) =(n—1)[n, B(1—73,)? = (n—1) (n+ Y)[n(n +2). 
Their values for H(r3.) differ by an amount O(n-*). Since the error factor in our approxima- 


tion is actually of the form 1+ pr/n+gqr*+..., it is easy to show that its first and second 
moments are respectively in error by amounts O(n-*) and O(n-*). We find 


E(r,.)~ —2/(n+1) = —2/n+O(n-*) 
which agrees with Jenkins’s value to the required order, but 
E(r3.) ~ (n+ 5)/(n +1) (n+2)~ 1/n+2/n? + O(n) 
which agrees with Jenkins’s result to O(n-*) but not with Dixon’s. There is a similar dis- 
agreement with Dixon’s E(r3.). 


I am indebted to Miss P. A. Johnson for the computation of Table 1. 
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TESTS OF HYPOTHESES IN THE LINEAR 
AUTOREGRESSIVE MODEL 


II. NULL DISTRIBUTIONS FOR HIGHER ORDER SCHEMES: 
NON-NULL DISTRIBUTIONS 


By G. M. JENKINS 
University College London* 


1. INTRODUCTION 


It is known that the likelihood ratio criterion v, for testing an autoregressive scheme of 
order k—1 (written as a.R.(k—1)) against the alternative hypothesis that it is an A.R. (k) 
is given by the partial serial correlation between x; and x;_; when the effects of the inter- 
mediate variables have been eliminated. In a previous publication (Jenkins, 1954, to be 
referred to as I), it was shown that the smoothed form of the distribution of v, is given by 


P'(3n + 3) 
re 


To ' : ' 
where v, = - 5; and r,, is the circular serial correlation of lag k uncorrected for the mean. 


(1 —v3)k"—-2) (1 —v,), (1-1) 





The distribution was first derived for a random scheme and then shown to have the same 
form in an A.R.(1) so that it may be used to test an a.R. (1) against an A.R. (2). 

In the present paper, it is proposed to extend this to the case where the serial correlations 
are corrected for the mean and, also, to construct the relevant distributions for testing 
higher order schemes. It has been found that up to order 4, the distributions when there is 
no mean correction are alternately given by 


T(4n+ 1) 
Pe) = Ta) TGn+B) 
and (1-1) according as to whether the order of the aiternative hypothesis is odd or even. 
This result has been proved in the general case by Daniels (1956) using the more elegant 
method of saddlepoint approximation. 

When the serial correlations are corrected for the mean, it will be shown that the dis- 
tributions of the lower order partial serial correlations are alternately given by (1-1) and 

plz) = (BU, n— 4) + BG, n—H)}> (1-2-9 (1-29, (1-3) 
depending on whether the order of the alternative hypothesis is odd or even. This result has 
also been proved in the general case by Daniels (1956). The distributions given by (1-2), (1-1) 
and (1-3) have been designated by type «, type / and type y respectively. 

In the latter half of the paper, the exact and smoothed joint distributions of circular 
serial correlations (with and without mean correction) are derived, and detailed treatment 
of these distributions given for the Markoff scheme. The Yule scheme will be discussed 
at a later stage. 

* Now at the Royal Aircraft Establishment, Farnborough. 





(1 —a2)kr-D (1-2) 
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2. THE RELATION BETWEEN DISCRIMINATION AND ESTIMATION 


The procedure envisaged in testing specific hypotheses in an A.R. scheme is part of a more 
general procedure of discrimination for stationary time-series. This involves a decision as to 
the nature of the model structure between the three families of alternative hypotheses to 
randomness given by the autoregressive (A.R.), moving average (M.A.) and linear cyclical 
(L.c.) processes in the manner which is now being advocated by Rudra (1954, 1955). Rudra’s 
procedure, using a modification of Whittaker’s periodogram, enables a decision to be mn. ide 
between the L.c. and M.A. or a.R. schemes. If the latter is adopted, it is suggested that tests 
for the a.R. and M.A. schemes may be run simultaneously using a method previously given 
by Rudra (1952). 

We shall denote the procedure for picking out the ‘best’ family of hypotheses by Type- 
discrimination (or briefly, 7'-discrimination), and that for selecting the optimum order of 
hypothesis within a particular model type by Order-discrimination (O-discrimination). In 
this paper, we shall be concerned with a method of O-discrimination in the a.R. scheme 
applicable to fairly short series. 

It is important to point out that discrimination and estimation are complementary 
aspects of the inference problem for stationary time-series. It is a convenient property of 
the a.R. scheme that these two procedures may be carried out independently of one another, 
discrimination preceding estimation. This is not so for the M.A. scheme where the inference 
problem reduces to a trial-and-error process in which discrimination and estimation cannot 
be isolated. In the case of the a.R. scheme, we are in disagreement with the view that one 
fits the scheme and then tests the goodness of fit by means of the methods of Quenouille 
(1947) or Whittle (1952). It is not possible to fit the scheme until the order k has been 
determined, and this is precisely what the goodness of fit test is capable of doing. In fact, 
the tests of Whittle and Quenouille may be expressed in terms of partial serial correlations. 


3. THE GENERAL METHOD OF APPROACH 


In the distribution theory of serial correlation coefficients, it is necessary to distinguish 
between two seemingly independent concepts: 

(i) The device of circularity of the serial correlations introduced by Hotelling, which makes 
it possible to derive exact distributions in the first place. 

(ii) The Dixon—Koopmans smoothing technique which leads to more tractable dis- 
tributions. 

In the case of random series, it has been shown that there exists high moment agreement 
(up to order n) between the smoothed and exact circular distributions, with the consequent 
assumption that the former provide adequate approximations to the latter. When the 
variables are autocorrelated, the first moments of these distributions do not agree and it 
has been concluded that the smoothed distributions are no longer satisfactory. It is 
suggested that this is to lose sight of the initial aim, viz. to approximate to the distributions 
of the non-circular statistics. 

In this paper, it will be shown that there is considerable evidence that not only are the 
smoothed distributions tractable but also that they provide much better approximations 
to the distributions of the non-circular statistics than do the exact circular distributions. 
This provides a partial solution to the controversial problem as to whether one is losing 











188 Tests of hypotheses in the linear autoregressive model. II 


power by working with circular statistics. It is suggested that one should always calculate 
non-circular statistics and use the distribution theory of the smoothed circular statistics. 
In I, the author derived an ad hoc method for constructing p,(v.) (the subscript s indicating 
that the distribution has been smoothed). This method of approach may be generalized in 
the following manner: 
Writing the partial serial correlation between x; and 2;_;, in the form 


a 1, 
vy, = Re/Ry_1, 


where Se Soe te) Ke 
R, = | "1 ee Te-1 | 
| - 
| ° 
lt. tet 1 | 


and R}“+» is the cofactor of the first element of the (k + 1)st row of R,,, the method may be 
considered in four stages: 

(1) v,=71, Ve, ..., 0, are assumed to be independent as far as the smoothed distributions 
are concerned. 

(2) r, is expressed as a function of ¥,, Vg, ..., Up. 

(3) Using (1), the moments of v, may be derived by substituting for the known values of 
the moments of v,, V9, ..., ¥,_, and r;,.. 

(4) The joint distribution of v,, vg, ..., v;, may then be transformed to the joint distribution 
of 7,79, ...,7;, and the latter shown to have the same moments as the smoothed distribution. 
It may then be concluded that stage (4) justifies the assumption made in (1). 

The method for deriving the moments of the smoothed distribution has been described 
in detail in I and will be used quite extensively in what follows. 


4, NULL DISTRIBUTIONS FOR 0, AND 2, 


It is now proposed to extend the work on p,(v,) considered in I so as to cover the case of 
a fitted mean. It was pointed out in §9 of this publication that the method for v, could not 
be used for %, since: 

(1) p,(7,) was not known. 

(2) The assumption of independence of 7, and %, was not tenable since it led to incorrect 
bivariate moments for the constructed distribution p(7,, 7). 

It is necessary to revise these statements in the light of recent work. (1) is not true since 
p,(7,) may be constructed from a knowledge of the smoothed moments given by Dixon 








(1944), viz. 
, _ _ (2k—1)(2k—-3)...1 
Mak = (nm +1) (m+) ...(n+2k—1)’ (4:1) 
de (2k—1)(2k—3)...1 
Mak-1 =~ (q— 1) (n +3)... (n+ 28-1)" (4-2) 


Dixon fitted a type 1 curve using the first two moments, but it is obvious that since they 
satisfy the relationship 


’ n+l1)\ _, 
Pox = — (5) ean 
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these are the moments of the function 


Dah) = aregy Ae [1-22 a. (4:3) 


This is, in fact, another form for the expression obtained by writing p = 0 in equation (5-4) 
given by Daniels. The extension to the case where p+0 is treated generally in the next 
section. 

As far as (2) is concerned, it will be shown in this section that the assumption of inde- 
pendence of 7, and 2, leads to correct moments of the form &(7%7$), while those given by 
& (78-174) differ from their smoothed values by amounts which are negligible for practical 
purposes. 

The assumption of independence of 7, and *, leads to the following equations: 











&(1—7.)* = &(1—%)* (1 —73)*. (4-4) 


Using (4-1) and (4-2), these may be solved successively to give 








n -4 
ee ee 
n(n+3) _ P i n(n + 2) 
(n—1)(n+1) = §(1— 59) 1) 43)’ 
eet SO+8) on gp ere». 
(n—1)(n+1)(n+3) 2" (n+1)(n+3)(n+5)’ 


etc., and these result in the following forms for the lower moments of 7,: 


" 2 . l 3 
(7,3 6 erm __ 3 n+7 
(2%) ~ (n—1)(n+2)’ = ea =). 


It is not difficult to show that the general results take the form 


eee) ee 
oR) = — a1) +2). + 22)’ _ 








= (" er } tee nee me ss + 2k)’ “ 
and the latter may be rearranged to give 
(oR) = nm { ( 2k =i) (2k— B) i+ A... (2k +1)(2k—-1)... =e oak ) (2k — 3)... I}. 
n—1\|n(n+2).. “(a4 2k— 5 n(n+2).. HD n| (n+2)...(n+2k) 
It follows immediately that these are the moments of the function 
PG) = kt —apko-9 {(1 5 (vp), (47) 


where ko = B(, 4n—4)+ Ble, 4n— 4) BY, $n + 3). 











190 Tests of hypotheses in the linear autoregressive model. II 


It may be seen that the dominant term agrees with the saddlepoint approximation given 
by Daniels; it is also a type y distribution to the first order. 

It is necessary now to derive the moments of the constructed distribution obtained by 
transforming p(7,, 0.) = p(7,) p(¥_). The problem is not entered upon in its greatest generality 
here, since the method is similar to the one used in I for p(#,). For example, the moment 
Fiz, Of the constructed distribution may be evaluated as follows: 


EF) = (74) + 6G) FAL —r9)} 


n—3 


~ (n—1)(n+1) (n +3)’ 
and this agrees with the value derived for the exact moment (and hence the smoothed 
moment if its order is less than n) by the author in I. 


In a similar manner, it may be shown that 
E (7,7) = &(7) + &(%,) FF,(1 —7,)} 


| he 
(n—1)??\ mJ 
which is to be compared with the value given in I, viz. —1/(n—1)?. Generally, it appears 
that moments of the form 7/,, , are correct whilst those of the form /ii2, 1,40 not agree with 
the moments of the smoothed distribution. 
It may thus be concluded that as far as the smoothed distributions are concerned, the 
partial serial correlations are not independent when the serial correlations are corrected for 


the mean. Also that the type # and type y distributions in this case are not smoothed 
distributions but represent dominant terms in these forms. 


5. NULL DISTRIBUTIONS FOR Uv, AND 0, 
The general method will now be applied to the construction of p,(v,) and p,(v4). As will 
become apparent, the situation is much more complicated for these cases, and it seems 
difficult to extend the analysis to cover the cases where k > 4. Since the method of approach 
has now been described in detail, the algebraic manipulation will be omitted and only the 
main stages in the argument included. 
In terms of the notation of §3, we may write 


14 (7? — 19) —1y7o(1 — re) + 75(1 — 7?) 
0 = RE YR, = WEP eee 
2 


This may then be simplified to give 


(73-11) +74(1 — 79) (1 — 2)? 6 
= 2 See Se ee se - 5-1 
” (=Ay0-8) rr 
and then rewritten to satisfy condition (2) of §4 in the form 
rg = U3(1 — 13) +7, —17(1 — 79) (1— 22)”, (5-2) 


wh2re (1—T%o) = (1 —72) (1 — 03) ... (L—v). (5-3) 
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In a similar manner, it may be shown that 


| 
v, = R}/R, = 14 | + Rg. 
4 4°°/R; | o mn ion 3 
| i ee i: oe 
This may be simplified using (5-2), yielding 
(vg— 1) (1 — 39) = (74-1) + (71-75) [711 — Vg) — ¥g(1 + v)], (5-4) 


and rewritten to satisfy condition (2) of §3 in the form 


(vg— 1) (1 = To) = (74-1) + 791 — 79) (1 — 09)8 — 217, 03(1 — 9) (1 — 03) (1 — 22) 

+ 3(1 — 19) (1-03) (1+). (5°5) 
Equations (5-1) and (5-4) emerge as very simple formulae for calculating v,; and v,. The 
saving in labour by using these formulae instead of the original determinantal forms is 
considerable. 


We now proceed to determine the smoothed moments of v, using the known results for 
the moments of v, and also that 


& (r?*—) = 0, | 


a (5-6) 
ital SS ES MES | 





Taking expectations throughout in (5-2) and using condition (1) of § 3, it follows immediately 
that &(v,) = 0. Taking the cubes of both sides and then expectations, it follows that 
& (v3) = 0, since &(v,) = 0 and the remaining terms involve expectations of odd powers of r, 
and r, which are known to be zero from (5-6). Proceeding in an analogous manner, it may be 
shown by means of an inductive argument that the odd moments are all zero and it remains 
to find the even moments. These may be derived by the method used in the previous 
section, viz. by considering (5-2) raised to successive powers. However, the algebra is 
slightly simpler if we proceed as follows. Since (r,—7,)? may be written in the form 


031 — 19)? (1 — 03)? + r9(1 — 19)? (1 — 0g)4 — 2vg7y(1 — 79)? (1 — 09) (1 — 22)", 
it follows by taking expectations that 


n+8 


(n+ 2)(n+4)" si 


a n g 
E(r3—1) = nya l(a) 
It is possible to show by differentiation of the smoothed characteristic function in the 
manner described in I that &,(r,7,) = 0. It follows therefore that &,(r, —1 3)? = 2/(n+ 2), and 
substitution in (5-7) yields &,(v3) = 1/(n + 2). 
In a similar manner, we may write 


3(n + 14) (n+ 16) 6n(n + 12) 





Zi n @ 
é(r3—1,)* = nee (8)— 


(n+2)(n+4)(n+6) (n+2)...(n+8)’ 
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and the left-hand side may be evaluated using the following smoothed moments which are 
obtained quite easily: 


&(r,73) = 0. (5-9) 
100 — Bean) O vi 
118) = GE RaeHOFO oe 

3 


i te ies ee 
and substitution in (5-8) yields &,(v$) = n+ 2)(44)° 
Continuing the process, it appears that the moments of v, are those of type « form. In fact, 
this is quite easily verified by writing down the joint distribution of v,, v,,v, and trans- 
forming to p,(71, 72,73) which may then be integrated to give p,(73), showing that the con- 
structed distribution has the correct marginal distributious. 

It has not been found possible to treat the problem of comparing the multivariate moments 
of the smoothed and constructed distributions in its greatest generality. The method will 
be illustrated in the case of &(r?8-"rs). 

From (5-2) we obtain an expression for the moment of the constructed distribution in 


anaes & (r¥r5) = Elo) — SfrH(1 —19)} 6-24)? 


«PB Br. 5 (5-12) 
(n +2) (n+4)...(n+28 +2)’ 
The corresponding smoothed moment may be derived from the smoothed moment- 
generating function which is obtained by writing a; = 0 and k = 3 in (6-7), viz. 


27 
(9, 91, 9,95) = exp - | log {1 — 0) — 0, cos « — 8, cos 2a — 8, cos 3x} da| , 
0 


From this we may obtain &,(r?*—!7,) by differentiating partially with respect to 0, placing 
#,=0=90, and then expanding this in the form of an infinite series in powers of 
« = 0,/(1—9,) (for further details, see §7 of I). Thus 


0d n (3 n (27 cos 3ada 
fs es pes a Re a FO 8 
Ea exp| zl, log {1 — 0, — 0, cos a} aa| =|, (1-0, —B, 0084)’ 


and the first integral may be expanded in the form of an infinite series and integrated 
term by term giving 

oo wr (29 — 2) (29-1) (29-3). Vg 

an 2, - MAgGeiy ” * 
The other factor may be written as [}{1 + ,/(1 —?)}], and a series expansion for this has been 
given in I. The two series may then be multiplied giving finally 

od, i 2 Ut+j+2)(t+j+3)...(¢+2 
[setky-2= [a2t+ 3 eae 
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and by integration it may be shown that &,(7%—'r,) is the same as the expression given by 
(5-12). In this manner it is possible to show that a large number of the simpler trivariate 
moments of the constructed distribution agree with those derived from the smoothed 
characteristic function. 

The procedure for constructing p,(v,) is similar in principle but leads to much more 
complicated equations for the moments. It has been verified that these are the moments of 
the type / distribution. 

Fairly extensive sampling experiments have been conducted on artificial series which 
demonstrate the fact that the distributions of the v, alternate in the case of random series. 
It has also been shown that the distribution theory provides a basis for O-discrimination in 
the a.R. scheme. For example, in the case of schemes of orders 1, 2 and 3 using samples of 
22 upwards, it has been shown that the number of correct classifications is high except in 
the case of short series with very strong autocorrelation. It is conjectured that this effect 
may be explained by the fact that @(v,) in an a.R.(k—1) depends on @, %,...,@,_1 to 
O(n-*). It is not difficult to show that p,(v;,, | a... %,_) is the same as the distribution of »,, 
in a random scheme, so that it appears that the smoothed distributions are inadequate 
when » is fairly small and a, ... «,_, are close to their limiting values. 


6. SOME GENERAL THEOREMS ON NON-NULL DISTRIBUTIONS 


There seems to be some confusion in the literature about the derivation of non-null dis- 
tributions, and a number of those put forward are not density functions at all since they do 
not integrate out to unity. 

A fundamental theorem in this work is that due to Madow (1945), but this applies to 
exact distributions only. In this section it is proposed to apply a unified method in order to 
derive four distributions, viz. the exact and smoothed non-null joint distributions of circular 
serial correlations with and without mean correction. In view of its practical importance as 
an approximation to the exact joint distribution of non-circular statistics, the smoothed 
distribution with serials corrected for the mean is of particular interest. In the case of the 
Markoff scheme, it leads to a smoothed distribution for the first lag serial correlation 
corrected for the mean, analogous to Leipnik’s distribution. 


Defining the modified standardized variance and serial covariances by c; = 2? > iy A 


with analogous expressions ¢; when there is a mean correction, it may be shown that the 
joint characteristic function of these statistics in the a.R. (k) scheme 


Oy yp + Ly Hyp te. Uy _p~ ty =O (%=—1) (6-1) 
may be written in the form 
of 27) 2mjk\-4 
(Oo, .--, 9) = | J, | TL 59 +0, cos — +... +5;,cos al : (6-2) 
j=1 n n 
- J; n-1 27) 2rjk\- oo 
and P(A, ---, 9%) = = el {bo+ cos —o + .. +b, cos- a 4 : (6-3) 
k 
respectively, where b; = a;—10;, a; = 2 ao, %4;(j = 1,2,...,4) and ay = Dab | J, | is the 


Jacobian of the transformation from the ; z’s to the w’s in (6-1) and is given by the circulant 


n 
| J, | = TI {1 -—a,0;-a,07 —...-— a, Ff}. (6-4) 
j=1 


13 Biom. 43 
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It is necessary now to derive the smoothed forms of these functions. The smoothing process 
will be defined formally in the following manner: 


If ‘ 
lim —= log $(O., on es ae 
then the smoothed characteristic function will be defined by 
$,(9p; .-->9,) = exp L-*, (6-5) 
Since this procedure is rather arbitrary, it will be necessary to show that this function 
satisfies the relationship - $,(0,0,...,0) = 1. (6-6) 


It is obvious that it satisfies all the other necessary conditions for being a characteristic 
function which have been given by Kendall (1948), since (6-2) satisfies these conditions and 
the latter are unaffected by smoothing. In order that (6-5) is to represent a true characteristic 
function, it must satisfy the further condition that, when inverted, it yields a positive 
function. 


The following theorem will now be stated and its proof given in outline. 


THEOREM. Provided that a, %», ...,&;, are such that (6-1) generates a stationary process, the 
smoothed characteristic function corresponding to (6-2) is given by 


2n 
(9, ..., 9.) = exp |- =| log {b) +b, cos yr +... +b, cos ky} ay| ‘ (6-7) 
0 
and it satisfies condition (6-6). 
The proof resolves itself into a consideration of the separate limits of the Jacobian and the 


finite product in (6-2). Using the following lemma, it may be shown that the contribution 
due to the Jacobian is zero. 


Lemma. Provided that the a,,...,0;, are such that (6-1) generates a stationary process, 
| J, | > 1 as no. 
Proof. This follows by observing that | J,,| may be written in the form 
(1— Af) (1— Ag) ... (1 — Af), (6-8) 


where A,, Ag, ..., A;, are the roots of the equation 





xk —a,ak-1— ...-—a, = 0. 
From a general theorem due to Wold (1938), it is known that the roots of this equation have 
moduli less than unity so that each term in (6-8) tends to 1 as noo. 

The proof of the theorem is completed by showing that the product term in (6-2) tends to 
the expression (6-7), and the latter may be shown to satisfy (6-6) by justifying an interchange 
of limit operations. 

Since 

(l—a,—a,—...—a,)?—10)—10, — ... —10,.]* 
Bg, ---»9,) = G(Ap, «..; ~ += ——| , 
BO 194) = $0 ---0)| areca. 





it follows that the same relationship holds for the corresponding smoothed functions. We 
now proceed to express the non-null distributions corresponding to these characteristic 
functions in terms of their null distributions. For example, Fourier inversion of (6:2) 
yields 


«© a) k 
PlCop »++1€y |) = amines |, mi €£2585 $(Ogy Ox) TL AO, 
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whilst the transformation A; = (a;—70;) results in 


k 
P(Co; «++, Cy | %) = exp| — Sa; | Jy, | (Co, ---,Cy |; = 0). 

It is then possible 

(i) to transform to new variables Cp, 7, ..., 7x5 

(ii) to use the fact that c, is independently distributed of p(r,, 79, ...,7;,) when the a; = 0, 
and 

(iii) to integrate with respect to cy between the limits 0 and oo. 

In the case of (6-2) and (6-3) this leads to 








| J | 
13, Te, ++, %| &) = 3 MG ,; ..5 7 A, =O), (6-9 
pl 2 *e x | i) [a,+4,r,+...+a,r,] 1 x | ) ) 
S a Ls | J, | (l—a—... — ay) 4 *) nat . 
PUP Fay +4 Fe | Ox) = [ay +4,7,+ +a, Fo Dey Fy | = 0), (6-20) 


respectively, where the null hypothesis distributions 

P(ry, «+57, |%;, = 0) and p(7,,...,7,| a; = 0) 
have been given by Quenouille (1949) and for a more general case by Watson (1956). It is to 
be noted that (6-9) and (6-10) are different from the distributions for an s.r. (k) scheme 
given by Quenouille (1949). 

In the case of the smoothed distributions, the procedure is identical except that in this 
case it is necessary to prove that p,(c)) = p(¢,) is independent of p,(71, 72, ..., 7;,.). This follows 
by a trivial modification of a theorem due to Pitman (1937), replacing the discrete set of 
characteristic vectors of the quadratic forms in the definition of the serial correlations by 
a continuous set in the same manner as Koopmans (1942). 

The above method then leads to the following smoothed distributions: 


1 


‘. [ag +a,7, +... +a,7;,]2" 





Pala +++ 7x: | %e) Pe(T1, «++ 7 | We = 0), (6-11) 


(1 — ta eee =. ae 
[4g +@,7,+...+ dF, 8? s 





p(T beng | a;) = (71; sony | a; = 0). (6-12) 


(6-11) is the generalization of Leipnik’s distribution, whilst (6-12) is new and corresponds 
to equation (9-14) given by Daniels. In the next section we shall be concerned with (6-11) 
and (6-12) when k = 1. 


7. NON-NULL DISTRIBUTIONS ON THE MARKOFF SCHEME 
In an A.R. (1), equations (6-9), (6-10), (6-11) and (6-12) reduce to 


P(r, | &) = (1—at) p(r, | &, = 0) [1 +032 — 2a,7,]-*, (7:1) 
POs | 4) = (FS) we [an = 0) [1 + af — 27,40, (7-2) 
Pal?z | %) = poly | %, = 0) [1 +08 — 204 r,]-4", (7:3) 
PF | %) = Par n= [1 +03 — 2a, 7,]-4-», (7-4) 


The null hypothesis distributions for (7-1) and (7-2) have been given by R. L. Anderson 
(1942), but these exact distributions are of no use practically since, in addition to being 


13-2 
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very complicated, they deviate considerably from the distributions of the non-circular 
statistics. This observation has already been made by Kendall (1954), who showed that 
whilst the expectation of the circular first lag serial correlation is given by 


Mt 1 ie ~3} 
E(T1)c, = a5 (1 +4) +0(-), (7-5) 
the corresponding formula for the non-circular coefficient is given by 
ie 1 e) a8 
E(F)n.c. = Oy —= (1 +844)+0(;3). (7-6) 


It will be shown in this section that &,(7,),, agrees to the required order with (7-6), and this 
illustrates the agreement to O(n-!) between the smoothed circular and exact non-cireular 
distributions. 

(7-3) is the distribution first put forward by Leipnik (1947), and (7-4) is an analogous 
distribution with the serial correlation corrected for the mean—this agrees with the saddle- 
point approximation given by Daniels as equation (5-4). 

The form for p,(7, | «, = 0) has been given in § 4, and it is to be observed that this function 
has a negative loop in the positive tail so that it is not a density function, although it 
integrates out to unity. For moderate n, however, it behaves like a type f distribution. 

We now proceed to show that the moments of (7-4) may be expressed as functions of the 
moments of the Leipnik distribution. It is necessary first of all to derive the first four 
moments of the latter distribution. The first two moments were derived by Leipnik himself 
by a very complicated method starting from p,(7,,¢9 | ,). A simpler method is required in 
order to derive the higher moments of p,(7, | «,) and, also, this must be capable of extension 
to cover the new case p,(7, | «,). 

This is obtained by expanding the distribution in the form of an infinite series in the 
following manner: 


@ a tn & (48 4+5-—1)! ,, 
Pdts| aa) = Belts laa = O04 ODS Capi gt Mt 
where A = 2a,/(1 +7), so that | A| <1, and hence 
, (nak © (4n+2j-1)!,,. , . 
Mig = E (eH | ay) = (+a) ic * as AM Haj+ak» or 
' = m +2) —2)! Saad 7m 
a eae an 1)! j ar AN" asthe ae 


By substituting for the null moments, viz. those of the type « distribution, it is possible 
to derive the non-null moments by manipulation of the resulting infinite series and using 


the basic identity m, = 1. The details are omitted here and the results quoted in the following 
form: 


mi = 2 

1 0 +2? 

saline y n(n+1) a? 

2 m+2° (n+2)(n+4)’ 

ae 3NX n(n + 1) at 

3 (n+2)(n +4)" (n+4) (n+ 6)’ 

de 3 6n(n + 1) a? n(n +1) (n+3) a4 
i= ms Yee 


(n +2) (n+4) (n+2)(n+4)(n+6) | (n+4)(n+6)(n+8) 
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These lead to central moments in the form 


rings ae ee 2) at 
2 nt2\0 (w+2)(n+4)J’ 
fa — 6na, £ 2n(n — 2) (3n — 2) xy 
3 (n+ 27 (n+4) " (n+2)3(n+4) (n+ 6)’ 
aes 3 6nacs (n? — 8n — 4) 3na4(n — 2) (n? — 14n?2 + 12n—8) 








(n+2)(n+4) (n+2)3(n+4)(n+6)° (n+2)4(n+4)(n+6)(n+8) © 
In asimilar manner, (7-4) may be expanded in the form of an infinite series and the moments 
expressed in the form ret a+ ait) de 2 (in! +i 1)! - 

oe =a) i Gn gt OPH 
where 7, are the moments of p,(7, | «, = 0) and are given by (4-1) and (4-2). Using thenotation 
m'(n’) to denote the ith moment about zero of p,(7, | «,) but based on n’ = n—1 instead of 
n observations, it follows that (5-1) and (5-2) may be written in the form 


(7-9) 


—# , 
Hor = Hear 





aos n—l\ , (7-10) 
Pay = — (5) Hake 
We now proceed to prove the following identity: 
—p 1 , , n + 1 , Ul . 
m= Ta) [mitn Pf (+5) Mj4(n ). (7-11) 


It is necessary to distinguish between the two cases where 7 is odd or even. Writing 1 = 2k 
in (7-9), it follows that 

—, _ (1+a%)-¥'( = (4n'— 1425)! © (dn ee = 

Moy = (l—«,) ia o ( in ray” Passo p> 1 (4n’ — )!(4j- 1)! A?) 1 M33 + oh — 1{>? 
and substituting for the expressions given by (7-7) ie (7-8) and using (7-9) and (7-10), we 
obtain (7-11). The proof for i odd is similar. Since the values of m; are known for i = 1, 2,3 
and 4, it follows that m; may be derived fori = 1, 2,3. If the necessary substitution is made 
in (7-11), it may be shown that 








i clidapenthce Snag) — Be) .... — , 
M, = gah tw ~1)(n +3) (1—a,) * (n?—1)(n +3)’ (7-12) 
™ 1 n 2na,(1 + 2a,) 16n aa 
6 SS) Sg ere ae Eye ieee Sakins me. ‘ey 7-13 
ms = 41 tn+3™ (n®—1)(n+3) (w+1)(n+3)(n+5) (1—ay) he 
= LON) BIO Sek?) YO Berg Se 
™s = — a1) (a +3) 243 (n+3)(n45)— tA) 
__ 8n(n—3) Bott n(n + 1) (7-14) 
(n?—1)(n+3) l—a, 1—a, (n+3)(n+5)(n+7): 
These lead to the following asymptotic expansions for the mean and variance: 
1 8 
Mi, = a4 7 (1+ 804)— 3 (lay) +7 “4 =) “5 :); (7-15) 


"2 1 = 1 
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The above results are interesting in that they show that 

(i) ™{, agrees up to O(n-") with the first moment of the non-circular statistic independently 
given by Kendall (1954) and Marriot & Pope (1954). 

(ii) The term in O(n-*) makes a large contribution to the variance even for moderate a, 
(it may be seen that the contribution is positive if «, > 0-26). This is in agreement with the 
conclusions of Marriot & Pope (1954) who conducted sampling experiments with artificial 
A.R. (1) series and found that the expected variances to O(n—') were significantly less than 
the observed variances. The variance of the non-circular statistic is not known to O(n-*) so 
that we suggest that formula (7-16) be used as an estimate. 














Table 1 
= r : 
| & (var.) & (var.) | 
| Source Oy | n k* ne to to P(x?) P(x3) 
O(n-) | O(n-*) 

Uae Ok mer co ome oeeoa, 
| Marriot & Pope 0-4 | 20 60 0-0529 0-0420 0:0466 | 0-09 | 0-23 

(1954) 0-4 | 40 40 0-0398 0-0210 0-0224 | 0-0007 0-002 
0-4 60 30 0-0258 0-0140 0-0145 0-004 0-007 
| 0-8 20 60 0-0419 0-0180 0-0434 |<0-00001 0-55 
| 0-8 40 40 0-0156 0-0090 0-0154 0-003 0-43 





| Rao & Som 0-8 15 50 0-0509 0-0240 


| 
| 
| 
| 

















0-0696 |<0-00001 0-91 

(1951) 0-8 35 25 0-0212 0-0103 00186 | 0-002 0-28 
a | ——EE —_ : te ot 

| Kendall (1949), 0-7 22 40 0-0462 0-0232 | 0-0390 | 0-0002 | 0-20 
| Series 9 H 
| (extended) | | 

















* k = number of subsets on which the variances are based. 


Table 1 shows the improvement in the estimates of variance when the term in O(n-*) is 
taken into account. The observed variances of 7, in the first six series are those given by 
Marriot & Pope (1954, Tables 9 and 14). The investigation was supplemented by calculating 
the variance of 7, for the following series: 

(a) Two Markoff series with a, = 0-8 given by Rao & Som (1951), the serial correlations 
being based on 50 sets of 15 items and 25 sets of 35 items respectively. 

(6) 40 sets of 22 items from a series given by Kendall (1949) which was extended from 
500 to 1000 terms by the author. 

In all cases the serial correlations were non-circular but varied in definition slightly due 
to the use of a pooled or unpooled variance in the denominators. It may be seen that the 
agreement between observed and expected variances to O(n-*) is good except in two 
cases. 

The analysis of this section has been extended to the Yule scheme where the situation is 
more complex. Details of this work will be published at a later stage. The same effect has 
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been observed here as with the Markoff scheme, viz. that estimates of variances and 
covariances to O(n~-') are inadequate for autocorrelated series. 


In conclusion, I would like to record my thanks to the Department of Scientific and 
Industrial Research for a maintenance grant during the period of this research. 
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MISCELLANEA 


A class of distributions for which the maximum-likelihood estimator is unbiased 
and of minimum variance for all sample sizes 


By D. E. BARTON 


University College London 


1. We consider a sample (2,,...,%,) randomly and independently drawn from a population whose 
law p(x; @) depends on an unknown parameter @ in such a manner as to admit a sufficient statistic for 0. 

Blackwell (1947) noted that if ¢’ is an unbiased (but possibly inefficient) estimator of 0, then, if ¢ is 
a statistic sufficient for 0, 


T = &(t' |t) (1) 
is purely a function of ¢ (and therefore sufficient for #) and 

é(T) = 0. (2) 
Further, since Var’ = VarT'+ &[Var (t’ | t)] (3) 


he noted that 7’ has a variance not greater than that of any unbiased estimator from which it may be 
derived. Hoel (1951) showed that a function of 7, sufficient for 0 and obeying (2), is effectively unique, 
and so we may talk of the best unbiased estimator. 

Since (as is typified by the example Blackwell gives) it is generally easy to find an unbiased estimator 
of 0 and the method of maximum likelihood yields a sufficient statistic, 6 say, for @ in these circum- 
stances, it would seem that it is always possible to find the best unbiased estimator of 0. Unfortunately, 
E(t’ | 6) may be a function which it is not feasible to evaluate. For instance, in the example of equation (14) 
the sample arithmetic mean, %, is unbiased whilst the sample geometric mean, g, is sufficient for 0, but 
it is not possible to evaluate the integral for &(%|g). The present note remarks that it is always possible 
to choose a function ¢ = ¢(9) for which the maximum-likelihood estimate is the best unbiased estimator. 
The method is extended to the multi-parameter case. 


2. Koopman (1936) and Pitman (1936) showed that the most general form of probability law admitting 
a sufficient statistic is 
p(x; 9) = exp {a(9) f(x) + B() +9(2)}, (4) 


where a, £, f, g are functions of the stated variables. The mean value of the log-likelihood function is thus 





L = «(0)f+ (0) +9, (5) 
where j= 5 f(a,)/n, 7= SE g(x;)/n. (6) 
i= i=1 
Hence = = a'(O) f+ p(), (7) 
00 
OL 
0= “fa = a'(8) &(7) +f"), (8) 
and the maximum.-likelihood equation is 
aL i al 
0= |, -g =e Osteo. (9) 
Thus if d = £(0) = —f'(0)/a’(), (10) 
then &(6(6)) = (0). (11) 


Expressing p(x; 0) as p(w; 6) by means of (11) we have for the maximum-likelihood estimator 


&() = ¢. 
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Further, the form of p(x; ¢) is such that the equality conditions for the Cramer—Rao inequality hold 


and so ; 
a é 
Var ¢ = — [ns fe 


where A(¢) is the function « expressed as a function of ¢ by (11). Thus if 


2 


1 
= [nA(9)}, (12) 





log p(x; #)} | 





x9 
° = o—x x> 0 
P(x; 8) T@+1)° (x20) (13) 
F(9) is the digamma function of # and 
dlog T(@+ 1) 
= —_—— = 0 i. 
¢ a6 F(@) (14) 
~ § 
then d = —log (2,23... Ly); (15) 
n 
and &(¢)=¢, Varg =—F(0)/n. (16) 


It follows from the well-known theorem on sufficient estimators that not only the variance but also the 
characteristic function of ¢ is known. Thus if ¥(¢) is the c.c.¥. of ¢, 


, at - 

ww = i6[ 4+") it) 

whence, for instance, the shape constants for é are 

LA, 13A3—A,A, 
SS eS ee =.= = ’ 18 
" nA} Yan As os 
dA 

where ye rae) (19) 


dr 
3. Koopman and Pitman showed that the most general case of a law depending on p unknown 
parameters (6,,...,9,) admitting sufficient statistics had the form 


Pp 
px; 0) = exp 2 (0) f(x) + (A) +aw)., (20) 
r=1 
where 6 denotes (6,,...,9,). It follows as before that 
Pp 
O= & a7(0) &(f,)+hMA) (8 = 1,---5p) (21) 
r=1 
p m Se 
and 0= 2 a/,(0)f, +20) (8 =1,.--,p), (22) 
r=1 
here ta a 0 80) = (0 (23) 
where ol )= 20, 7 )s B; = 20, ( ), 
and 6 denotes (6,. Saey 6,). Hence, if 
pr = ¢,(9) = —B,/A (r = 1,...,p), (24) 
where A = | [a/,]| and B, is A with f; replacing «;, in the rth row, then (23) becomes 
j= (r = 1,...,p) (25) 
and (22) is &($,) =¢, (r=1,...,p)- (26) 


The minimum variance property follows from Cramer’s (1946) generalized inequality, namely, that, in 
his terminology, the concentration ellipsoid of ¢,,...,¢, lies inside that of any other p estimators of 


Pry oo0s Pp 


The variance-covariance matrix of the {¢,} is easily seen to be 


Peay" _ 1724, “* 
n 0, n | éa, |’ 
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where @a,/0¢, denotes the partial differential of «, (expressed as a function of ¢,, ..., 6) with respect to ¢,. 
The right-hand side of (21) gives a simpler form since it does not require matrix inversion. 


This follows also from the generalization of (17); namely 


ae it; 
oY = idt(a,+—2) (r=1,....p). 28 
at, id (« = (r=1 p (28) 


where y is the joint c.a.r. of {¢,} in terms of dummy variables {¢,} and ¢*(a;) denotes the value of 
¢, when the variables {0;} are written as functions of {«,}. 


4. For example, the Pearson type III law of known origin 


0 90 + a etn 
P(x; 1» 99) = 1 T(@,+1) (0<2), 
if $1 = (92+1)/0,, $2 = F(9,)—log 9, 


yields explicit unbiased maximum-likelihood estimators of ¢,,@, with variance-covariance matrix 
1 tae 1)/02 1/0, 

n 1/0, F(0.)_|° 

The type I law pix; 0,,0,) = (1—a)*29/B(O,+1,0,+1) (O<a<1) 

gives $1 = F(9;)—F(9,+9,+1), $9 = F(A) —F(0g+9, +1), 


with the variance-covariance matrix 


S eee vee — F(9,+6, +1) 
n — F(0,+6,+1) F(O,) — F(0,+9,+ 1) P 
It should be noted that these results do not apply to the multi-parameter case where interest is focused 


on one of the {0,} say 0,, and the variables are so integrated out that we are left with the distribution of 
one sufficient statistic dependent solely on @,. For instance, when the variables (x;) are the bivariate 
observations (y;,2;) from the five-parameter normal surface and we derive the marginal law of r (the 
sample correlation coefficient) which is dependent solely on p (its population value), the method yields 


rr oe 
4(1—r?) n—3/(1—p?) 
i.e. this relation results from the analogous equations to (8) and (9). It is essentially different to equation 


(11) though it has interesting similarity. This is a particular case of a general result to be published by 
B. I. Harley. 


5. In conclusion, it may be said that this note gives a method for obtaining a best unbiased sufficient 
statistic in cases where Blackwell’s method does not yield tractable results. It may only be reasonably 


applied when the property of unbiasedness is of more importance than the functional form of the 
parameter estimated. 
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Further critical values for the two-means problem 


By W. H. TRICKETT, B. L. WELCH anp G. 8. JAMES 
University of Leeds 





Tables have already been given of one-sided 5 and 1 % critical values of a quantity, v, used in comparing 
two means, and more generally used in a wide class of comparisons whose accuracy depends in a certain 
way on two variances separately estimated (A. A. Aspin (1949), reproduced as Table 11 of Pearson & 
Hartley (1954)). We now present tables of two-sided 5 and 1 % critical values of v, i.e. the one-sided 
2-5 and 0-5 % values.* 

If yis a normally distributed estimate of a population parameter 7 with sampling variance A, 03 + A, 03, 
where A, and A, are known positive constants, and if s} and s} are estimates of o? and o3 distributed in 
the standard fashion with v, and v, degrees of freedom respectively, and if y, s? and s3 are all independent, 
then we define 

v = (y—m)/y (Aisi +Ag83). 
v has a distribution depending on the unknown ratio of o7 to o3, but, writing 
© = A, 81/(Aysi + A283), 


we may seek a function V(c; ;. ¥,), depending on the ratio of the observed sj and s3, with the property 
that 
Pr {v> V(c; 4, Vg, a)} = a, 


where & is a prescribed probability. (Knowledge of such a function would permit us, for instance, to 
make confidence statements about 7 in the usual manner.) The formulation of the problem in these terms 
is due to Welch (1947). 

In the present tables, and in the previous tables calculated by Mrs Aspin, the object has been to give 
values of V(e; v,, ¥2,%) to two decimal places. The methods of computation described by Welch do not 
allow this to be done for very small values of v, and v,. In the present tables for « = 0-025 we have 
V1, Ye >8, and for « = 0-005 we have v,,v,>10. Even with this limitation, in a few instances there is 
doubt about the second decimal place. We are, in fact, admitting the possibility of the occasional 
occurrence of errors up to 1 unit in the last figure, rather than the usual 4 unit due to rounding off. 

Where interpolation is necessary, direct linear interpolation should be used in all directions in the 
tables except in any panel which is bordered by vy, or vy, = «©. In such cases harmonic linear interpolation 
may be used. 

For an example of the use of the tables reference should be made to Aspin (1949) or Pearson & Hartley 
(1954, p. 27). 


Some of the calculations made in computing and checking the present tables were carried out on the 
Manchester University (MK.1) automatic electronic computer. We would like to acknowledge the 
assistance of Dr D. W. J. Cruickshank, Miss D. E. Pilling and Mr J. F. P. Donovan, of the Department 
of Inorganic and Structural Chemistry, Leeds University, in helping one of us (G.S.J.) to make use of 
this machine. 
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* The new tables are arranged in precisely the same form as the earlier tables, but their descriptive 
title has been slightly modified. 
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Upper 24% critical values of v = (y—1)/V (Ay 8? + Azs}) (i.e. upper 5% critical values of | v |)* 






































| 20 1:96 | 1:96 | 1-96 | 1-96 | 1-96 | 1-96 | 1-96 
| 
a | | | | 





1-96 | 1-96 


Ay si | | | | 
sa, | 00 | OL | O2 | O03 | O4 | O05 | 06 | O07 | O8 | 09 | 1-0 | 
(Ay 8} + Az 89) | | 
a al mee ere me Sane: | | — 
| a ee 
Vo Vy | 
8 8 2-31 | 2:25 | 2-20 | 2-14 | 2-10 | 2-08 | 2-10 | 2-14 | 2-20 | 2-25 | 2-31 
10 2-31 | 2-25 | 2-20 | 2-15 | 2-10 | 2-08 | 2-08 | 211 | 214 | 219 | 2-23 | 
12 231 | 2-25 | 2-20 | 2-15 | 2-10 | 2-07 | 2-07 | 208 | 211 | 214 | 218 
15 2-31 | 2-25 | 2-20 | 215 | 210 | 2-07 | 205 | 206 | 2-08 | 210 | 2-13 
20 2-31 | 2-25 | 2-20 | 2-15 | 2-10 | 2-06 | 2-04 | 2-04 | 2-05 | 2-07 | 2-09 
x0 2-31 | 2-25 | 2-20 | 2-14 | 2-09 | 2-05 | 2-01 | 1-99 | 1:97 | 1-96 | 1-96 
10 | 8 2-23 | 2-19 | 214 | 211 | 208 | 2-08 | 210 | 215 | 220 | 225 | 2-31 | 
| 10 | 2-23 | 2-18 | 214 | 211 | 2-08 | 2-06 | 2-08 | 2-11 | 2-14 | 218 | 2-23 | 
| 12 | 2-23 | 2-18 | 2-14 | 2-10 | 2-07 | 2-06 | 2-06 | 2-08 | 2-11 | 2-14 | 2-18 | 
15 | 2-23 | 2-18 | 2-14 | 2-10 | 2-07 | 2-05 | 2-05 | 2-06 | 2-08 | 2:10 | 2-13 | 
20 2-23 | 2-18 | 2-14 | 2-10 | 207 | 2-05 | 204 | 2-04 | 2-05 | 2-06 | 2-09 | 
oC | 2:23 | 2-18 | 2:14 | 210 | 2-06 | 2:03 | 2-00 | 1-98 | 1-97 | 1-96 | 1-96 | 
| 
12 | 8 | 218 | 214 | 211 | 2-08 | 2-07 | 2-07 | 2-10 | 2-15 | 2-20 | 2-25 | 2-31 | 
10 | 218 | 2-14 | 211 | 2:08 | 2-06 | 2-06 | 2-07 | 2-10 | 214 | 218 | 2-23 | 
12 | 2-18 | 214 | 211 | 2-08 | 2-06 | 2-05 | 2-06 | 2-08 | 2-11 | 2-14 | 218 
15 2-18 | 214 | 2-11 | 2-08 | 2-06 | 2:04 2-04 2-06 | 2-08 | 2:10 | 2-13 
20 | 218 | 2-14 | 2-11 | 2-08 | 2-05 | 2-04 | 2-03 | 2-03 | 2-05 | 2-06 | 2-09 | 
20 | 218 | 2-14 | 211 | 2-07 | 2-04 | 202 | 1-99 | 198 | 197 1:96 1-96 
| | 
15 | 8 | 2-13 | 210 | 2-08 | 2-06 | 205 | 2.07 | 210 | 215 | 2-20 | 2-25 | 2-31 | 
10 | 2:13 | 2-10 | 2-08 | 2-06 | 2.05 | 2-05 | 2-07 | 2-10 | 2-14 | 2-18 | 2-23 | 
| 12 | 213 | 2-10 | 2-08 | 2-06 | 2-04 | 2-04 | 2-06 | 2-08 | 2-11 | 214 | 2-18 
| 15 =| 213 | 210 | 208 | 205 | 2-04 | 2-03 | 2-04 | 2.05 | 2-08 | 210 | 2-13 
| | 20 2-13 | 2-10 | 2-08 | 2-05 | 2-04 | 2-03 | 2-03 | 2-03 | 2-05 | 2-06 | 2-09 
00 | 2-13 | 210 | 2-07 2-05 | 202 | 2-00 1-99 1-97 | 1-97 | 1-96 | 1-96 | 
| | | | 
| 20 8 2-09 | 2-07 | 2-05 | 2-04 | 2-04 | 2:06 | 2-10 | 215 2-20 | 225 | 231 | 
| 10 2-09 | 2:06 | 2-05 | 2-04 | 2-04 | 2-05 | 207 | 2-10 | 2-14 | 218 | 2-23 
| 12 2-09 | 2-06 | 2-05 | 2-03 | 2-03 | 2-04 | 2-05 | 2-08 | 211 | 214 | 218 
| 15 | 209 | 2-06 | 2-05 | 2-03 | 2-03 | 2-03 | 2-04 | 2-05 | 2-08 | 2-10 | 2-13 | 
| 20 2-09 | 2-C6 | 2-05 | 2-03 | 2-02 | 2-02 | 2-02 | 2-03 | 2-05 | 2-06 | 2-09 | 
a 2-09 | 2-06 | 204 | 2-02 | 201 | 1-99 | 1-98 | 1-97 | 1-96 | 1:96 1-96 | 
20 s 1-96 | 1-96 | 1-97 | 1-99 | 2-01 | 2-05 | 2:09 | 2-14 | 2-20 2-25 2-31 | 
10 1-96 | 1-96 | 1-97 | 1-98 | 2-00 | 2-03 | 2-06 | 2-10 | 2-14 | 2-18 | 2-23 
12 1-96 | 1-96 | 1-97 | 1-98 | 1-99 | 2-02 | 2-04 | 2-07 2-11 | 214 | 2-18 
15 1-96 | 1-96 | 1-97. 1-97 | 1-99 | 2-00 | 2-02 | 2-05 | 2-07 | 210 | 2-13 
| 20 1:96 | 1:96 | 1-96 | 1-97 | 1-98 | 1-99 | 2-01 | 2-02 | 2-04 | 2-06 | 2-09 
| 1-96 | 1-96 





* yis normally distributed about 7 with variance A, 07 + A,@} and s? and s3 are independent estimates 
of oj and 3, based on vy, and v, degrees of freedom respectively. A, and A, are known constants. 

In the problem of comparing the means of samples taken from two normal populations, put 
Y =(%,—%,), Vy = (ny—1), ve = (n2—1), A, = J /ny and A, = 1/n,, where n, and nz are the sample sizes. 
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* Upper 4% critical values of v = (y—1)/V(Ay8i+Azs}) (i.e. upper 1% critical values of | v|)* 
ei, ‘en * | | | | 1B | | 
1,8? 
0 | ——5 00 | O1 | 02 | 03 | 0-4 | 
19 | (A, 8} te A283) | 

















| 
| Ve V; } | 
2.3] 10 10 3:17 | 3-08 | 3-00 | 2-90 | 2-82 | 2-79 | 2-82 | 2-90 | 3-00 | 3-08 | 3-17 | 
2-23 | 12 3-17 | 3-08 | 3-00 | 2-91 | 2-82 | 2-78 | 2-79 | 2-84 | 2-91 | 2-98 | 3-05 
2-18 15 3-17 | 3-08 | 3:00 | 2-91 | 2-82 | 2-77 | 2-76 | 2-78 | 2-83 | 2-89 | 2-95 | 
>-13 20 3°17 | 3-08 | 3-00 | 2-91 | 2-82 | 2-76 | 2-73 | 2-74 | 2-76 | 2-80 | 2-85 | 
2-09 | 30 3-17 | 3-08 | 3-00 | 2-91 | 2-82 | 2-75 | 2-71 | 2-69 | 2-70 | 2-72 | 2-75 | 
1-96 | 00 3-17 3-08 | 2-99 | 2-91 | 2-82 | 2-74 | 2-67 | 2-63 | 2-60 | 2-58 | 2-58 | 
| | | 
2-31 | | 12 | 10 305 298 201 284 279 | 278 282 | 2-91 | 3-00 | 308 | 3-17 
2-23 | 12 3-05 | 2:98 | 2-91 | 2-84 | 2-78 | 2-76 | 2-78 | 2-84 | 2-91 | 2-98 | 3-05 | 
2-18 | 15 3°05 | 2-98 | 2-91 | 2-84 | 2°78 | 2-75 | 2-75 | 2-78 | 2-83 | 2-89 | 2-95 
2.13 | | 20 3°05 | 2-98 | 2-91 | 2-84 | 2-78 | 2-74 | 2-72 | 2-73 | 2-76 | 2-80 | 2-85 
2-09 | | 30 3-05 | 2-98 | 2-91 | 2-84 | 2-77 | 2-73 | 2-70 | 2-69 | 2-70 | 2-72 | 2-75 
1-96 | | oO 3-05 | 2°98 | 2-91 | 2-84 | 2-77 | 2-71 | 2-65 | 2-62 | 2-59 | 2-58 | 2-58 
2.31 | | | 15 | 10 2-95 | 2:89 | 2-83 | 2-78 | 2-76 | 2-77 | 2-82 | 2-91 | 3:00 | 3-08 | 3-17 
2.23 | | 12 | 295 | 2-89 | 2-83 | 2-78 | 2-75 | 2-75 | 2-78 | 2-84 | 2-91 | 2-98 | 3°05 
2-18 | | 15 2:95 | 2-89 | 2-83 | 2-78 | 2-74 | 2-73 | 2-74 | 2-78 | 2-83 | 2-89 | 2-95 
2-13 20 | 2-95 | 2-89 | 2-83 | 2-78 | 2-74 | 2-71 | 2-71 | 2-73 | 2-76 | 2-80 | 2-85 
2-09 30 | 295 2-89 | 2-83 | 2-78 | 2-73 | 270 | 268 | 268 2-70 | 2-72 | 275 
1-96 oo 2-95 2-89 | 2-83 | 2-77 | 2-72 | 2-67 | 2-64 | 2-61 | 2-59 | 2-58 | 2-58 
| | 
2-31 | 20 | 10 2-85 | 2:80 | 2-76 | 2-74 | 2-73 | 2-76 | 2-82 | 291 | 3:00 | 3:08 | 3-17 
2-23 | | 12 2-85 | 2-80 | 2-76 | 2-73 | 2-72 | 2-74 | 2-78 | 2-84 | 2-91 | 2-98 | 3-05 
2-18 15 2°85 | 2-80 | 2:76 | 2-73 | 2-71 | 2-71 | 2-74 | 2-78 | 2-83 | 2-89 | 2-95 
2-13 20 2:85 | 2-80 | 2:76 | 2:73 | 2:70 | 2-70 | 2-70 | 2-73 | 2-76 | 2°80 | 2-85 
2-09 | 30 | 2:85 | 2:80 2-76 | 2-72 | 2-69 | 2-68 | 2-67 | 2-68 | 2-70 | 2-72 | 2-75 
1-96 | 00 | 2-85 | 2-80 | 2-76 | 2-72 | 2-68 | 2-65 | 2-62 | 2-60 | 2-59 | 2-58 2-58 
| | 
2-31 | | 30 | 10 2-75 | 2-72 | 2-70 | 2-69 | 2-71 | 2-75 | 2-82 | 2-91 | 3-00 | 3-08 | 3-17 
9.93 | 12 2-75 | 2-72 | 2-70 | 2-69 | 2-70 | 2-73 | 2-77 | 2-84 | 2-91 | 2-98 | 3-05 
2-18 | | 15 2-75 | 2-72 | 2-70 | 2-68 | 2-68 | 2-70 | 2-73 | 2-78 | 2-83 | 2-89 | 2-95 
2-13 | 20 2:75 | 2-72 | 2-70 | 2-68 | 2-67 | 2-68 | 2-69 | 2-72 | 2-76 | 2-80 | 2-85 
2-09 | 30 2-75 | 2-72 | 2-69 | 2-67 | 2-66 | 2-66 | 2-66 | 2-67 | 2-69 | 2-72 | 2-75 
1-96 | 00 2-75 | 2-72 | 2-69 2-66 | 2-64 | 2-62 2-60 | 2-59 | 2-58 | 2-58 | 2-58 
| | | 
2-31 a 10 2-58 | 2-58 2-60 | 2-63 | 2-67 | 2-74 | 2-82 | 2-91 | 2-99 | 3-08 | 3-17 
2.23 | 12 2:58 | 258 2-59 | 2-62 | 2-65 | 2-71 | 2-77 | 2-84 | 2-91 | 2-98 | 3-05 
2-18 15 2-58 | 2-58 | 2-59 | 2-61 | 2-64 | 2-67 | 2-72 | 2-77 | 2-83 | 2-89 | 2-95 
2-13 20 | 2-58 2-58 | 2-59 | 2-60 | 2-62 | 2-65 | 2-68 | 2-72 | 2-76 | 2-80 | 2-85 
2-09 30 | 2:58 | 2-58 | 2-58 | 2-59 | 2-60 | 2-62 | 2-64 | 266 | 2-69 | 2-72 2-75 
1-96 0 2-58 | 2-58 | 2-58 | 2-58 | 2-58 | 2-58 2-58 | 2°58 | 2-58 | 2-58 | 2-58 
| 
mates * y is normally distributed about 9 with variance A, oj +A, 03, and s{ and s3 are independent estimates 
5 of a} and o}, based on vy, and vy, degrees of freedom respectively. A, and A, are known constants. 
s, put In the problem of comparing the means of samples taken from two normal populations, put 


pe Y = (%,—%). vy = (ny—1), ve = (ng—1), Ay = 1/n, and A, =1/ny, where n, and n, are the sample sizes. 
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An approximation for the symmetric, quadrivariate normal integral 


By J. A. McFADDEN 
U.S. Naval Ordnance Laboratory, Silver Spring, Maryland 


The generalized tetrachoric series for the multivariate normal integral, as given by A. C. Aitken 
(unpublished), Kendall (1941, 1945) and Moran (1948), is not considered useful for computation, as 
stated by David (1953). The purpose of this note is to sum this series approximately in one special case, 
namely, the case of four variables when all six correlation coefficients are equal. 

Suppose that 2,,x7.,7, and 2, obey a quadrivariate normal distribution and that all the off-diagonal 
elements of the correlation matrix are equal to p. Let P,(p) be the probability that all four variables are 
simultaneously positive. Following Moran’s paper, we may write 


P,(p) = at Ssinp+ 2 — 3 S(p), (1) 
where S(p) = 3p?— 4p? + 7p — 12p5 + 3388 — 447 + 348258 — (2) 
The series (2) appears hopeless for computation. Proceeding formally, we make the transformation 
p=sing (-sin-'}<¢<}n), (3) 

and expand S(p) in powers of ¢; then 
S(p) = 36% — 443 + 644 — 1065 + 184% — 133347 + 34848 _ .. (4) 


The series (4) can be summed approximately with the aid of a non-linear sequence-to-sequence 
transformation given by Shanks (1955). If Ay, A,, A, A, and A, are the first five elements of a sequence, 
then Shanks’s second-order transform e, provides a new sequence, the first element of which is 

A, <A, A; 
e(A,)=| AA, AA, AA, 
AA, AA, AA, 


| 1 1 1 
AA, AA, AA, }, (5) 
AA, AA, AA, 











where AA, = Ani,—An- 











Table 1 
| | 
1/p P,(p) from (6) Ruben’s %,(1/p) 
| 
1 0-50121 388 0-50000 00000 
2 0-20000 802 0-20000 00000 
3 0-14973 847 0-14973 76529 
4 0-12647 941 | 9-12647 9249 
5 0-11301 258 | 0-11301 25446 
6 0-10422 406 0-10422 4047 
7 0-09803 672 0-09803 672 
8 0-09344 504 0-09344 505 
9 0-08990 265 0-08990 27 
10 0-08708 697 | 0-08708 71 
ll 0-08479 535 0-08479 5 
12 | 0-08289 404 0-08289 4 








Again proceeding formally, we apply e, to the first five partial sums of the series (4) and replace S(p) 
by the first element of the transformed sequence. Then the result for the probability (1) is 


1, 3,, 1  943+5¢) 
Pale) = 76 + ay? ana 44) (14: 26)" a 


where ¢ is defined by equation (3). 
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Since we have not justified the previous steps mathematically, we must test the result numerically. 

Table 1 gives a comparison of the approximation (6) with the results of the numerical integration of 
Ruben (1954) for the cases 1/p = 1, 2,...,12. [If p = 0, equation (6) is exact.] 

The agreement in Table | is exceptionally good except when p is near unity. If greater accuracy is 
desired, it may be obtained by the use of seven terms of equation (4) and the application of more com- 
plicated non-linear transformations, as given by Shanks (1955).* 

The approximation (6) is considerably more accurate than another one described elsewhere by the 
author (McFadden, 1955), which was based on an analogy with Pélya’s urn scheme; yet the formula (6) 
is no more complicated than the other. 

If the multivariate normal integral is known for four variables, the result for five follows immediately, 
as shown by David (1953). 

Attempts to extend the present method to 7 variables, with all correlation coefficients equal, and also 
to the general quadrivariate case (with correlation coefficients unequal) have been unsuccessful. 
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Weighted probits allowing for a non-zero response in the controls 


By M. J. R. HEALY 
Rothamsted Experimental Station 


Black (1950) has shown how standard probit calculations can be carried out using tables of ‘weighted 
probits’. These functions are defined by 


7, weighted minimum probit =w(Y—P/Z), 
7, weighted maximum probit =w(Y+Q/Z), 


where Y is the probit value corresponding to a proportion P, Z is the corresponding ordinate, w = Z?/(PQ) 
is the probit weighting function and Q = 1—P. Ifrsubjects are observed to respond and s not to respond, 
the appropriate weight is rw+sw and the working probit multiplied by the weight is r7, +87. Since 
rand s are usually fairly small integers these are simple computations, and the necessary table look-ups 
are more convenient than those required when using the tables of Finney (1952) or Fisher & Yates (1953). 

Weighted probits have certain advantages when probit calculations are carried out on high-speed 
automatic computers. In desk computation, the working probit is usually found by adding a fraction of 
the range (1/Z) to the minimum working probit (Y — P/Z). Neither of these functions is particularly well 
behaved, each becoming numerically large for extreme values of Y, and Finney (1952) has provided an 
extensive table of double entry to by-pass this stage of the computation. This table is too large to be 
held in the store of present-day machines. By contrast, the weighted probit functions behave reasonably 
from a numerical point of view in the range of practical importance, and, since high-order interpolation 
is a fast and simple process on automatic computers, wide-interval tables with comparatively few 
entries can be used which can be accommodated in the store of the machine without difficulty. It is of 
course possible to avoid the use of tables altogether by computing the functions ab initio as required, 
but in the present instance the necessary routines would be rather complicated and would probably 
occupy almost as much room in the store as the tables. The process would also be more time-consuming 
than a simple table look-up. A further advantage in the use of tables is that the graduating function 


* The author (J.A.M.) recommends the use of the iterated first-order transformation for each 
numerical value of ¢. 


ree a 


a ee 
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can be changed from probits to, for example, logits or angles without any alteration in the main 
programme, simply by inserting the appropriate tables. 

An extension of the simple probit technique is required when it is necessary to allow for a non-zero 
response rate untreated material. Finney (1952, pp. 88-91) has shown that when this ‘natural response 
rate’ is well determined and not too large, its effect can be allowed for by two simple modifications to 
the ordinary process. In the first place, the observed proportions responding, P, are adjusted by the 
so-called Abbot’s correction to give adjusted proportions P’ = (P—C)/(1—C), where C is the natural 
response rate; secondly, the weights have to be multiplied by a factor P’/{P’+C/(1—C)}. It is of some 
interest that these alterations can be taken care of by modifying the tables of weights and weighted 
probits, thus enabling the original machine programme to be used unchanged. 

It is somewhat simpler to work in terms of Normal Equivalent Deviates rather than probits, the 
addition of 5 being in any case irrelevant when working on automatic computers. The necessary altera- 
tions to the weights, w, are taken care of by inserting a suitably modified table. The working probit, y, 


is given by y = QYyot PY; 


where P’ is the adjusted proportion responding, Q’ = 1—P’ and yp, y, are the minimum and maximum 


working probits. In terms of the observed proportion responding, this gives 
Q r P-C 
Y = ——= Yo +t — Yvp 
¥=7-0c""T 1-6" 


so that the required value, nwy, is given in terms of the weighted probits by 


TM Ty (r+s)C 
0" ie" ¥-0 


1,—Cn 
=F (’ 7 | -b 171). 








nwy = 8 





Thus the form of the computation remains unaltered if we use the adjusted weighted probits 
m9 = (™—Om)/(1—C) 


and 77,, where the modified weights are used in computing 7, and 7, as functions of Y. 

Tables of w, 474 and 47, have been prepared covering the range of N.E.D.’s— 4(0-125) +4, with 
C' = 0(1) 9%. Four-point interpolation in these tables (a convenient method is that described in Fisher 
& Yates (1953, p. 33)) gives 5-decimal accuracy over almost the whole range. Similar tables for the logit 
and angle function are in preparation. 
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Treatment variances for experimental designs with serially correlated observations 


By J. C. BUTCHER 


Applied Mathematics Laboratory, New Zealand Department of Scientific and Industrial Research 


1. INTRODUCTION AND SUMMARY 


Williams (1952) has considered the design of field experiments in which the fertilities of neighbouring 
plots are assumed to follow a linear autoregressive scheme of order one or two. Here a formal generaliza- 
tion is made and a notation introduced which simplifies the calculation of variances of estimates. 

A model of the following form is assumed: 


Up + Ay yy + AgM gt... +X» = Ey (1) 
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where the €, are independent and distributed N(0, o*). For convenience this will be written as 
R 
2X aX; = €y (2) 
i=0 
it being understood that a, is to be replaced by 1 when written in full. 
The yield is assumed to be equal to the sum of 2, and a term depending on the treatment, i.e. 
Ye = Xt be), (3) 


where the notation (¢) denotes the number of the treatment used on the ¢th plot. There are c treatments. 

Estimation equations are derived, and it is found that the information matrix factorizes in a manner 
(20) which simplifies the calculation of the variance of the difference of the estimates of treatment 
constants. Some examples are given. 


-2. NOTATION 


A square matrix of order c all of whose elements are unity will be denoted by HZ. Consider matrices of 
the form aI + BE. As E? = cE we see that 


(a, 1+, E) (aI +P, E) = a, 4,1 +(%,f,+%,2,+cf,f,) E. (4) 
If «,I + £,£ is the inverse of a, 1+, #, then 
1 —hy 
a=—, = —-———__., 5) 
- a Pr 4 (%, +cf;) ( 


We are interested in the cases where the information matrix is a,1 +f, and the variance-covariance 
matrix a,I1+/,H. The variances will equal «,+/, and the covariances will all equal #,. Thus the 
variances of differences wiil equal Qa, = 2/ay. (6) 

The symbol 6} is defined for the ith plot and the jth treatment as unity if that plot receives this treat- 
ment and zero otherwise. As each plot receives some treatment we have 


Dsj=1. (7) 
j 
Also we have such results as 
p> a; = 2 a,03_;. (8) 
(t—j) = j 


The number of plots receiving the jth treatment is 


x8. (9) 


3. ESTIMATION OF PARAMETERS 


1 
64 
0 | Fanon": si 


Since the z’s and e’s are connected by the linear relation (1), the likelihood of the sample (24, ag, ..., %n) 
can be directly calculated from (10). The Jacobian of the transformation is unity. In order to avoid end 
effects which introduce irrelevant complications, we shall regard the values of 29, %_,,...,@p41 aS 
additional parameters to be estimated. Estimation of these by maximum likelihood immediately gives 
€, = € =... =€,=0. To see the effect of correlations between the estimates of these x’s and of the 
other parameters, consider instead the ¢,(¢ = 1, 2,...,) which constitute independent linear combinations 
of the x, (¢ = 0, —1,..., —p+1). We see that the estimates of these ¢, are independent of the estimates of 
the other parameters. 

We may thus maximize the expression for the conditional likelihood 


The frequency function of the e’s is 


i & 
~Les x e?/o?+nloga (11) 
pt+l 
in order to estimate the parameters a;, b;. 7 is, of course, best estimated by dividing the residual sum of 
squares by n—2p—c. 


14 Biom. 43 
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The estimates of the remaining parameters @ are the solutions of 


aL 1 & be 


i exer —_=0. 12 
30 = 4,7, 06° (12) 
This gives for 0 = a, n 
7 ZX (y%-i— bao) & = 0, (13) 
pt+1 
n 
and for 0 = b; >» ( >» a.) €,= 0. (14) 
p+1 \(t-i)=j 
By (8) this can be written 
2 
x 2D (a,é4_,e,) = 0. (15) 
t=p+1i=0 


These equations can be solved by successive approximation, taking the first approximation for the 6; as 
the mean yield for the plots receiving the jth treatment. Using these values the a; can be approximated 
and their values used to improve the approximation of the b,, and so forth. 


4. VARIANCES AND COVARIANCES OF ESTIMATES OF TREATMENT CONSTANTS 


The variance-covariance matrix of parameter estimates is the inverse of the information matrix, which 
imately given b : 
s approximately given by - / par “ii 
( 00,00,} | 


We observe that if 0, = b,, 0, = a; or a”, then 
aL 
E\| -——— } = 0. 17 
( 20, za) ii 
We may thus consider merely the matrix 
aL 
re 18 
[#(-ars,) | ea 
as the information matrix for the b’s. 


Let V be the variance-covariance matrix for the 6’s. We have 


viel 5 ¢ 5 ay = a) |. (19) 

O't=pt+1 (t-k)=i = (t-D =f 
It is easily verified that V-1! = (AA) (AA)’/e?, (20) 
where A = (63) (¢=1,2,...,¢;7 = 1,2,...,n), (21) 
A=(a,-4;) (¢=1,2,....n3j = 1,2,....n—p), (22) 


where a, is set equal to zero if k<0 or >p. That is, the information matrix can be expressed in terms of 
A which depends only on the design (and which characterizes the design) and on A, which depends only 
on the autoregressive coefficients. It seems reasonable to require that the design be symmetric with 
respect to all treatments, which leads to the requirement that V-! be of the form aI + BH. One must thus 
place certain restrictions on A. It is easily seen that the value of n —p must equal me (m being an integer, 
the number of times each treatment occurs). As the last p treatments must be the same as the first p, 
we need only consider the mc treatments. In all that follows it is assumed that an additional p plots are 
used. It is clear that a repetition of a suitable design of mc plots any number of times to make a design 
of length Nmc is also suitable. 

Conditions for suitable designs are easily seen. Let u{? be the number of times the ith and the jth 
treatment occur r plots apart. Then the condition is that 


(ul?) = a,I+f,E (23) 
for all r from 1 to p. 
Suitable designs are given by Williams (1952) for the cases: 
p=1, 4%=0; p=1, %+/,=9; p=2, %+/,=%+/,=0, 
which he calls respectively types (II 6), (II a) and (ITI). 











(12) 


(13) 


which 


(16) 


(17) 


(18) 


(19) 


(20) 
(21) 
(22) 


ms of 
3 only 
with 
t thus 
teger, 
rst p, 
ts are 
lesign 


1e jth 


(23) 
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It is clear that when designs suitable for p = 1 are found we can generalize them to any p by repeating 
each treatment over p adjacent plots, e.g. 


1212... would become 
TU02 ... 3922... 2, B11... 1.388... 8525.3 


As designs for first-order autoregression where «,+/, = 0 are easily constructed by writing a suitable 
succession of permutations of the c treatments, we thus have a method of finding designs for any number 
of lags. 

As an example of the calculation of these matrices consider the case c = 3, p = 2,n = 3m+2: 





. ee ae 
ae. fe ls e 
L Py Aw oe 
AA becomes ia, 1 @ & 
a4 @ 1a 
11 a a 1 


m m 
Thus V4 = 5 (1 +aj+a3) I+ = (a, +43 + 4,4) (ZI) 


m m 
= 1 +a} +43 — a, —a,— 0,4) I+ = (a +, +0, 4.) E. 


Therefore the variance of the estimates of treatment differences, denoted by v, equals 


2c? 1 











=o 24 
ie 1—a,—a,+aj—a,a,+a} - 
by (6). 
In the following examples the number of plots is N +p. p = —a, is written when p = 1. 
Case:p=1,c=2: ee 
Design v = var (b, — 6.) 
40? 
121212... —=——,, 25 
1, ae (25) 
40° 
11221122... ——; 26 
, (26) 
The relative efficiency of (25) and (26) is 1 + 29/(1+,?) which is greater than one if p is positive. 
Case: p = 2,c = 2: hie Ds 
Design v = var (b, —b,) 
40? 
121212... Hn , 27 
[ | N(1—a,+a,)? (27) 
40? 
11221122... ° 28 
[ ] N(1+ai— 2a, a, +3) “— 
Consider now a more general case: 
WaT S882 a ERs ED “Re... 8; 
e+" Wo Wy’ ~~ ne WS 
N ie qe Ts qs Ts 
where p <q), 7,...; L¢ = Lr = s/a = 4N. We have 
4c? 
v= Pp ry D p , . (29) 
n| (2 a) —2a xX 2% a,a;|i-j| 
i=0 i=0j=0 


Thus if [La,a, | i—j | > 0 it is advantageous to make the runs of 1’s and 2’s as long as possible; otherwise 
the q’s and 7’s should all equal p. 
14-2 
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Case: c = 3, p = 2,n = N+2 (for case p = 1,n = N+1 substitute a, = 0, a, = —p): 





Design v = var (6,-—6,) 
60? 
23... - = 30 
cates N[(1 —a, — a, +aj—a,4, +43] — 
60? 
312, 231... ao erie 31 
(123, 312, 231...] Mia, +a?+ai]’ (31) 
60? 
123, 231, 312... eo —-, 32 
sea ] N[l—a,+a,+a2 —a,a, +43] (32) 
120? 
11, 22, 33... sa, - +. 33 
othacititaal’ ty N[2 +a, + 2a} +a, a, —2a, + 2a3] ~ 
The last is a special case of the following: 
1. Pe ee > ee ee ee 
a a a 
P Pp p 
6po” 
v= 72 32 2 —=. (34) 
n| p (.,) ~3 p> >» a,a;|i—j | 
0 i=0j=0 


Cases (24), (25), (26) and (30) are included in formulae given by Williams. 


In conclusion I wish to express my thanks to Dr P. Whittle who aroused my interest in this subject. 
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The multivariate distribution of complex normal variables 


By R. A. WOODING 


Department of Scientific and Industrial Research, Wellington, New Zealand 


INTRODUCTION 


The expression for the distribution of several normal real variables with correlation was introduced for 
the first time by Bravais (1846). Since then it has been discussed by many writers, notably Pearson 
(1896) and Cramér (1937, Chapter X). 

Its applications include the case when the components of the multivariate distribution are real 
stochastic time variables which obey the normal law, as in random noise (Rice, 1944, 1945). 

However, for certain problems connected with the envelope of arandom noise signal, it is advantageous 
to consider a complex form, of which the signal is a component (Bunimovitch, 1949). A logical step is to 
derive the equivalent distribution of an ensemble of complex variates for which the real and imaginary 
components are normally distributed. It is found that, provided the real and imaginary components 
Lns Yn Obey the covariance relations 


E(Xm Xp) = E(YmYn)> E(%mYn) = — E(xnYm); 
the distribution of N complex variates v, has the simple form 
nm | L|-exp(—V’*L-"Y), 


where L is the Hermitian covariance matrix. 


Since this does not appear to have been published before, a derivation based on the real-variable 
expression is given here. 
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(34) 
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DEFINITION OF COMPLEX VARIATES 


Consider N pairs of real variables z,,(t), y,(t), (l<n<WN), where ¢ represents a sampling parameter. Let 
the complex variable 
p(t) = X(t) + tyn(t) (1) 


be such that it can be written in the form 


v,(t) = x (An; —_ a n5) exp {i0,(t)}, (2) 
j 


in which the a,,;, b,,; are real coefficients. This complex Fourier series arises in numerous fields, particularly 
in theory related to time series. 
From (1) and (2) we see that 
q(t) = D {ans 08 O,(t) + by; 8in 8,(2)}, 
j 


Yn(t) = Vi—ay,; sin O,(t) + b,,; cos ,(t)}; 
j 


ie. the X,, y, are in ‘phase quadrature’. This property is expressed compactly by the reciprocal Hilbert 
transformations (Bunimovitch, 1949, p. 1231) 


. i Oe . 3 
Yalt) = = eis > tal —oC), ( a) 

(t ee , 3b 
vn oe inl o onl +0), ( ) 


where ¢ signifies the Cauchy principal value of the integral. 


COVARIANCE RELATIONS 


Using (3a, 36), it is readily shown that the x,, y, satisfy the following covariance relations, which are 


fundamental: 
1{° do 
E =- — E{z,,(t— t 
(YmYn) +f" * [@m(t— 7) ¥n(t)) 
1j° @ 
=- ¢ < Blan(t) yalt + o)] 
ee a 
= 216.2,). (4a) 
Similarly, E(amYn) = —E(XnYm)- (4b) 


It will be seen that, if the relations (4a, 4b), together with the standardizing relations E(«,) = E(y,) = 0, 
are taken as the initial postulate, our definition of the complex variates wil! not involve Fourier series 
concepts. 

If now we consider the complex column vector 


V=X+iY (5) 


formed from the v, in (1), the composite column vector {X Y} will have a covariance matrix of form 


A B 
Pv = ). (6) 


where the submatrix A is symmetric and B is skew-symmetric. It is easily verified that a decomposition 
of the corresponding inverse matrix is 


A B)\"_ ((A+BA—B)> A = - a (1 
e a =( ; ne Be! A})\. 7’ 
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which is of the same form as (6), so we can write 


ae _ Q 
Ata (5 ri =(¢ ?): 


with P =(A4+BA-B)-, 
Q = —(A+BA-—B)" BA, 
and P=P’, Q=-Q. 


Then it follows from formula (9) that P-iQ = (A-iB)-. 


NorMAL VARIATES 


When the real vectors X, Y are normal, the likelihood function of the composite vector {X Y} is 


(27)-¥ | A |~texp[—4(X’¥’) AX ¥}], 
for which the exponent may be written 
—{(X'Y’) AHXY} = —4(X’-7tY’) (P—1iQ)(X+7Y), 
since Q is skew-symmetric, while the decomposition 
>) = (I —iIl\(P-iQ .\(I aI 
Q’ P} ( I )( =@ iC ae 
shows that |A|- = | P-7Q||P+7Q| =|P-iQ|? = |A-iB|-. 
Combining the expressions (10)—(13), we now obtain the result 


(27)-" | A—iB|-2exp[—}(X’-i¥’) (A —iB) (X+iY¥)] 





(8) 


(9a) 


(96) 


(9) 


(10) 


(11) 


(12) 


(13) 


(14) 


for the likelihood of the complex normal vector X +7Y. But since the covariance matrix of the complex 


vector V = X +7Y is the Hermitian matrix 


L = E(VV’*) = (A—<B), 
it follows that | Z| = 2"|A-iB|, 


and that (14) can be expressed in the alternative form 
m-N | L|-texp(— V’*L-'V) 


for the normal probability function of the N complex numbers v,. 


It will be noted that this simple result is true only if the relations (4a) and (46) hold. 


THE CHARACTERISTIC FUNCTION 


For the density function (11), the characteristic function can be written 


(2m)-"| A rf | exp [i(R’S’) {XY} — }(X’¥’) A-4{X Y}]dXdyY, 
XJY 
N 
if dXdY = II dz, dy, and {RS} is a real column vector. This becomes 
n=1 


exp [— 4(R’S’) A{RS}] = exp[—}(R’—iS’) (A — 1B) (R+48)], 


since B is skew-symmetric. Hence, in the complex-variate notation (18) and (19) give 
mA|L if exp [¢ R1(7”*V) — V’*L-1V]dV = exp(—}7"*LT), 
V 


where dV = dXdY, T = R+iS, and RI signifies that the real part is taken. 


(15) 
(16) 


(17) 


(18) 


(19) 


(20) 











(8) 


(9a) 


(95) 


(9e) 


(10) 


(11) 


(12) 


(13) 


(14) 


nplex 


(15) 
(16) 


(17) 


(18) 


(19) 


(20) 
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It is evident from the above that the theory of the distribution of complex variates is most readily 
deduced from real variate theory. 


The author wishes to acknowledge the valuable advice and suggestions of Dr P. Whittle, of Applied 
Mathematics Laboratory, N.Z.D.S.I.R., Wellington, and to thank N.Z.D.S.I.R. for permission to 
publish this short paper. 
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Stationarity conditions for stochastic processes of the autoregressive 
and moving -average type 


By J. WISE 


University of Birmingham 


INTRODUCTION AND SUMMARY* 


The criterion for the stationarity of a process of the autoregressive and moving-average types has been 
given by Wold (1954) in the form that the roots of a specified equation shall.lie inside the unit circle. 
A generalization of this result has been given by Doob (1953), who showed that a stationary stochastic 
process of the autoregressive or moving-average type has a rational spectral density function g(z), where 
A(z) A(z 
giz) = O20), (1) 
B(z) B(z) 
A(z) and B(z) being polynomials in z with no common factors, the roots of the equation 
Bz) = 0 (2) 
all lying inside the unit circle, and no root of the equation 


A(z) = 0 (3) 
lying outside the unit circle. 

In this paper these results due to Wold and Doob are converted into a set of criteria which are 
determined directly from the coefficients of the polynomials concerned. No intermediate evaluation of 
the roots of the equation is necessary. Thus a set of necessary and sufficient conditions for the stationarity 
of a stochastic process of the autoregressive or moving-average type is derived explicitly in terms of the 
coefficients of the process. These criteria are expressed in a determinantal form permitting systematic 
treatment in the general case. All the elements of the determinants involved, for polynomials up to (and 
including) the fourth degree, are given, and these determinants are given in expanded form. In addition, 
a set of necessary (but not sufficient) conditions are given for stationarity, which do not involve deter- 
minants and which are of a very simple algebraic form. 


DERIVATION OF THE CRITERIA 
Let us consider the autoregressive moving-average process 


My Oy Hyg Hove FM _p = G+ Py Gat. + Prev (4) 


* This paper is based on material contained in Wise (1955a). 
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We may assume, without loss of generality, that the polynomials 


(5) 


ie tae 


and zh Bzt-14+ 428), 
have no common factor. 


Then, if the process (4) is stationary, all the roots of the equation 
zeta, ze14...4a,=0 (6) 
lie inside the unit circle, and none of the roots of the equation 
z+ B,2'-14...4+8, =0 (7) 


lies outside the unit circle if the process is non-circular, while all the roots lie inside the unit circle if the 
process is circular.* Let us consider the polynomial equation 


wF+a,cF414...4a,= 0. (8) 


The roots of this equation are given by the reduction of the polynomial to real or complex linear factors 
such that 





(7% —2,)(w@—2_)...(v—a,) = 0. (9) 
We now require the necessary and sufficient conditions for the satisfaction of the inequalities 
|zJ<1 (fg =1,2,3,...,%). (10) 
1 
Put y= and x=a+ib (11) 
x— 
(where a and 6 are real). We find 
a?+b?-1 2ib 
y = —— _ (12) 
(a—1)?+6? (a—1)?+b? 
= Ry) +iCly), (13) 
where R(y) denotes the real part and iC(y) denotes the complex part of y. We have 
a?+b?-] 
Ry) = —— Mea 14 
(y) (a1)? +08 (14) 
2b 
ie <ceere 15 
(y) (a@—1)?+82 (15) 
|x| = +./(a? +62), (16) 
(a—1)?+6?>0. (17) 
Thus from (14), (16) and (17) it follows that it is necessary and sufficient for | «| <1 that 
Rly) <0. (18) 
From (11) we obtain 
1 
e=2t, (19) 
y-1 
so that (10) is satisfied, if, and only if 
Riy;)<9 (j =1,3,...,&), (20) 
where y;(j = 1, 2,...,) are the roots of the equation 
(yt+1)*¥+a,(y+1)** (y—1) +... +a,(y—1)* = 0, (21) 
which is obtained by substituting for x from (19) into (8). Expressing (21) in ascending powers of y 
tai 
ee Pot Piy + Pay? +... +Pry* = 0, (22) 
k 
where Pe = Li MCh, (23) 
j=0 


& = 1 and ¢,, is the coefficient of y’ in the expression (y+ 1)*~/(y—1). 
The values of the coefficients c,; are tabulated below for k = 1, 2, 3, 4. 


* See Wise (1955) for a discussion of circular and non-circular processes of the autoregressive and 
moving-average type. 
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From these the coefficients p,, ...,, are tabulated fork = 1, 2,3, 4,expressed in terms of the coefficients 
1, Xa, 3, &,, on the condition p, having been set equal to unity. 

Necessary and sufficient conditions for (20) to be satisfied, expressed as criteria involving only the 
coefficients in equation (22), have been derived by Routh (1930) and are given in convenient form below. 

The criteria are obtained as follows. We form the matrix P, where 


TP: Ps Ps Pz ae 
Po Po Pa Po 
P= O Py Ps Ps = 3 (24) 








The Routh conditions then state that (20) is satisfied, and thus (10) is satisfied, if, and only if, the 
following inequalities are satisfied (py) having been put equal to unity): 


P;=p;>0 (j=1andj =k) 





Po Po | (25) 


P,>0 (r=1,2,...,k—-1), 


where P,(r = 1, 2,...,—1) is the principal minor of P formed from the first r rows and the first r columns. 
The principal minor formed from the first k rows and columns equals p;P,_,, and since both this and 
P;,_, must be positive, this implies that p;, is positive. A set of necessary, but not sufficient, conditions 
for (10) is given by the inequalities 

p>0 &= 1,2;:.458) (26) 


Expressions for p,(r = 1, 2,...,&) are given below, in terms of the coefficients of equation (8), for 
k = 1,2,3,4. This covers nearly all cases likely to occur in practice.* From these values the criteria 
given by (25) may be obtained directly. These criteria are given below, in expanded form, for k = 1, 2,3 
and 4, The expansions of the determinants in terms of the coefficients of equation (8) become unwieldy 
for values of k larger than 4. For cases in which k exceeds 4, it is more convenient to work with the 
P criteria in determinantal form. It is worth noting, furthermore, that the computational labour 
involved in evaluating the P criteria is small relative to the labour required to evaluate the roots of (8) 
numerically. 


Values of ¢,; 





























k=1 k=2 
= he. j 
T 
ny el 0 1 Min, 0 1 2 
j a m 
| 
0 1 Raut | 0 2 we 
| 1 —1 | | 1 _ 0 1 
| | 2 1 —2 1 
| | | | 
k=3 k=4 
\ 
a 1 2 3 ie a 1 2 3 4 
| \ Z 
0 1 3 3 1 | 0 4 6 4 l 
1 4 «& 1 1 1 o ~2 0 2 l 
2 .. ae tag 1 2 0 -2 0 1 
3 «2 S «@ 1 3 - 2 0 -2 l 
4 - s == 1 
| 




















* For more extensive tabulations for k = 1, 2, ..., 10, see Wise (1955a). 
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Tables of the p-coefficients 





1+a, 
= ’ = 0. 
Pr la Pe 
2(1—a) 1+@,+&, 
n= be ta eT? : 
(1—a,+a,)’ 1—a@,+&, 


(3 —a@, —a,+ 3a) 








Pr (104 +0q—2s) ' 
_ (B+, — a, — 3a5) 
Pa (l= a, +0%—@5) ’ 
— (Ete + oat) 
9 (L=y +q— ag)” 
2(2— H, +s — 204) 

Pr (104 + q— Og +04)” 
2(3— Hy + 3at,) 

P2= 3 ee 


(1— a, +.@_— a3 +04)” 


2(2 +a, — as — 204) 











=” (1— a, +0g—O5+0%)” 
_ (L4+ + hy +g + 4) 
£1 0ty +g — hg +4)” 
Tables of the P-criteria 
PL=p, 
-_ 
thug 
7 
Pi =P 
2(1—«@,) 
= —— > YU, 
(1— a, + aq) 
P,=p 
— (b+ +e) 
~ (Ta, +a) 
Pi=7y 


_ (3— a, — &%_ + 3a) 
(1— a +%_—s) 





>0, 


P,=PiP2—Ps 
ee Hy — hy + Bag) (3 + a, — &y — Bary) 


1+4,+4,+a 
_U +a +%s+%)_ 9 





(1—a, +a,—a,)? 
P3=DPs 
_ (L+ 0, +, +H) 
(1— a, +a, — as) 


(L—@, +, — as) 
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Pi=P, 
2(2—a, +a, — 204) 
~ (1— ay + Oy — Hy +.) 
P, = PiP2—Ps 
_ 4(2— a, + % — 2a) (8 — Ag+) 2(2+a,— A — 24) 


(1—@,+G@_,—G3 +4)? (1— a, + G_— 3 +4) 








P; = Pi P2P3— PiPs— 
_ 8(2— a, +03 — 204) (3 — A, + 3x4) (2+, — Oy — 2er4) 
(1, +0%_—&, +,)8 





_ 42 = Oy + Og — Deeg)? (L +, + Hy +g +04) 
(1— a, + %_— 3 +a4)8 
4(2+a,—a,—2a,)? 
(1—@, + G%_—3 +4)? 








(1+, +O, +3 + %4) 
(1— a, + Hy — ay + 4) 





= >0. 
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Some properties of an angular transformation for the correlation coefficient 


By B. I. HARLEY 
University College London 


1. Sheppard (1899), in an early paper in which he discussed the association between two variables, 
used a quantity which is equivalent to the arc sine of the correlation coefficient. As far as the writer 
knows, the properties of such a quantity have not been discussed, although Jenkins (1954), following 
suggestions put forward by others, did investigate the arc sine of the serial correlation coefficient. It is 
the purpose of this present note to take this early work of Sheppard a stage further and in particular to 
show that ifr is the product-moment correlation in the sample, and p that in the normal population, then 


&(sin-!7) = sin-!p. 
2. Let y = sin", 


and expand in a Taylor series about the point 7 = p. Using the moments of r expressed in the form of 
a series, we have on taking expectations and retaining terms of order n-* that 


é(y) = sin-1p, (l) 
1-—p? l 1 
ey) = P14 espe g(r sets seo, (2) 
2 
Kely) = — POS ss {14s (3-442 , (3) 
a1 





ky) = =P rropt— 0, (4) 
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from which we deduce 
9p? 5p? 2 
By) = nee {14 ES 7+ 139+ 5p8)| +. (5) 
n n mn 
3 2 - 
Baly) = 3+—(10p?—1))1—— (2 +p?) +...1. (6) 


For large n the quantity sin-!r is therefore approximately normally distributed. 


3. R. A. Fisher (1921) gave the normalizing transformation 
1 
z= flog, i= = tanh—'r, 


for which (making use of the correction given by A. K. Gayen (1951)) we have the moments 








p 5+p? 
é(z) = -1 
(z) = tanh pty efit ge + \, (7) 
a Om 4—p* , 38—Gp*— 3" 
60) = Gayl! taeent 6(n— 1)? ' (8) 
6 
and the # coefficients Bi(2z) = pP (9) 





(n—1)3 eeey 


2 4+ 2p? — 3p* 
z)=3 —_——_. oe 1 
Bxl2) a3" (n—1)? * 





Table 1. Comparison of the values of B, and P, for the distributions r, z and y 





n = 20 
| | rs | | | 
p Bir) By(2) Pily) £.(r) | B,(z) Bly) 
}———_—_____|_ : | |__| —_— 
| | 
0-1 | 00-0148 10-° x 0-1458 0-0044 2-7406 | 3-1164 2-9281 
0-2 | 0-0600 10-8 x 0-9331 0-0175 2-8213 | 3-1166 2-9522 
03 #| 8 0-1386 10-8 x 0-1063 0-:0397 2-9623 | 3°1168 2-9921 
0-4 | 0-2557 10-* x 0-5972 0-:0716 3°1749 | 3-1170 3-0470 
| 
0-5 | 04202 10-5 x 0-2278 0-1136 34783 | 3°1172 3-1162 
0-6 00-6464 =| 10-5 x 0-6802 0-1666 39055 | 3°1173 3°1986 
0-7 09584 10-4 x 0-1715 0-2314 | 45131 3°1171 3°2929 
0-8 1-4001 10-4 x 0-3822 | 0°3091 | §4154 3°1165 3°3974 
09 | 2-0603 | 10-4 x 0-7748 0-4004 6-8681 | 3°1154 35105 
| | | 














Apart from its normalizing properties, the great practical advantage of the z transformation is that 
it gives a variance which, to order (n — 1)-", is independent of p. The y transformation does not have this 
property but it provides a much neater value for the expectation, which is also independent of n. Without 
going into the question of the application of the results, it is proposed to make some numerical com- 
parisons of the properties of the sin-17 and tanh~'r distributions. 

A comparison of the values of £,(z) and /,(z) with /,(y) and £,(y) for n = 20 and p = 0-1 (0-1) 0-9 are 
given in Table 1. The values of £,(r) and £,(7), as given in Tables for Statisticians and Biometricians, 
vol. 2, are given for completeness. It will be noted that £,(z) is much smaller than £,(y) and this will be 
true for all values of p for n> 5. £,(z) is more independent of p than f,(y) which follows more closely the 
values for the r distribution. From an examination of the results it seems unlikely that the y transforma- 
tion will have a better normalizing effect than the z transformation. 


4. To compare the probability integral of r obtained by using the sin-!r transformation with the 
values obtained by using the tanh~'7 transformation we take as a first approximation 


y=sinr, &y)=sinp, K,(y) = (1—p*)/(n—2), 











(5) 


(6) 


(7) 


(8) 


(9) 


(10) 
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the divisor, n — 2, for the variance giving a good approximation to the expansion (2) when p is small. 
Using this mean and variance, a normal curve can be fitted and the probability integral of r obtained 
for various values of n and p. As a second approximation a normal curve can be fitted using the values for 
the mean and variance given in equations (1) and (2) respectively. 


Table 2. Comparison of exact and approximate values of the probability integral of r 


(i) n = 25, p= 0-2 


























| 
a. Par From z | Fromz |§ Fromy From y 
Ist approx. | 2nd approx. Ist approx. | 2nd approx. 
| | 
—0-2 0:02748 | 0-02 860 0-02 716 0-02 435 0-02 439 
—0-1 0-07 379 =| 0-07 758 0-07 444 0-06 999 0-07 005 
0 0-16 364 0-17 083 0-16 542 0-16 217 0-16 225 
0-1 0-30 630 0-31 551 0-30 807 0-31020 | 0-31 025 
0-2 0-49 168 0-50 000 0-49179 | 0-50000 | 0-50 000 
0-3 0-68 635 | 0-69177 0-68 466 | 069350 | 069344 
0-4 0-84 700 | 0-84 994 0-84 533 0-84 818 0-84 810 
0-5 094615 | 0-94 798 0-94 592 0:94263 0:94258 | 
06 | 098820 | 098928 0-98 875 0-98 477 0-98 475 
0-7 | 099877 | 0-99 909 0-99 903 0-99 752 0-99 751 | 
| | 
(ii) n = 25, p = 0-6 
| | 
= ar eee From z From z From y From y 
lst approx. | 2nd approx. Ist approx. 2nd approx. 
} 
2 Ienemiaannaes naan = ee 
0-2 | 000934 | 0-01 072 0-00 884 0:00 402 | 0-00 424 
0-3 | 0-03 097 | 0-03 598 0-03 078 002112 | 0-02 186 | 
04 | 009046 | 0-10311 0-09 147 0-08 216 0-08 365 
0-5 0-22771 = 0-24 994 0-22 971 0-23614 | 0-23 769 
06 | 047500 | 050000 | 0-47 521 0-50 000 0-50 000 
0-7 | 0-77 '782 0-79 299 0:77 584 | 0-78 544 0-78 381 
0-75 0-89 652 0-90 531 0-89 543 0-88 995 0-88 833 
0-8 0-96 741 0-97 140 0-96 769 | 0-95 555 | 0-95 442 | 
0-85 0-99 469 0-99 586 0-99 520 0-98 722 | 0-98 670 








Se a _ —_—— 





Similarly, if z = tanh~'r. we take as a first approximation 
6(z) =tanh-!p,_ x,(z) = 1/(n—3), 


and then, as a second approximation, take the values for the mean and variance given in equations (7) 
and (8) respectively, and in each case fit a normal curve with the appropriate mean and variance. The 
values of the probability integral of r obtained by these approximate methods, together with the exact 
values as given in F. N. David’s tables (1938), for the case where n = 25, p = 0-2, 0-6 are given in Table 2. 

From an examination of the results it is seen that in the case of the z transformation, the inclusion of 
further terms of the expansions for the mean and variance of z increases the accuracy of the approxima- 
tions considerably, and the values using the second approximation for z are closer to the exact values 
than any of the other approximations. Increasing the number of terms of the expansion for the variance 
in the case of the y distribution makes very little difference, since the skewness of the y distribution 
prevents it from being as good as the z distribution. 
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5. Previously (1954) we have given the Cornish—Fisher (1937) form of the Edgeworth expansion 
fitted to the distribution of tanh—!7, using the values of the first four cumulants to the full known 
accuracy of their expansions. For the purposes of comparison we use the same series expansion to 
approximate to the distribution of sin-!r, using the cumulants as given in equations (1)—(4). The 
probability integral values for the cases n = 16, p = 0-2 and n = 25, p = 0-9, are given in Table 3. 


Table 3. Comparison of exact and approximate values of the probability integral of r 


(i) n= 16, p= 0-2, f,(y) = 0-0215, f(y) = 2-9441 








| 
| - 
r Exact value — — 
expansion for y 
| 
eS Sa eee 
| —0-2 0-06 711 0-06 722 
—0-1 0-12 868 0-12 953 
| 0 0-22 076 0-22 202 
0-1 0-34 349 0-34 367 
0-2 0:48 934 | 0-48 806 
| 0-3 0-64 273 | 0-64 123 
| 0-4 0-78 316 0-78 297 
| 0-5 0-89 186 0-89 285 
| 0:6 0-95 947 0-96 026 
| 0-7 | 0-99 034 0-99 035 | 


(ii) n = 25, p = 0-9, f,(y) = 0-0319, f(y) = 34403, f,(z) = 10-4 x 0-3844, f,(z) = 3-0897 














| | | 
_ Edgeworth | Edgeworth 
| r Exact value | i : 
| expansion fory | expansion for z 
| | + PAPAS Pore ene 
| 0-75 | 0-00 743 | 0-00 684 | 0-00 742 
| 0-82 | 0-05 574 | 0-05 834 | 0-05 578 
| 0-87 | 0-22 387 0-22 896 0-22 379 
| 0-90 0-46 244 0-45 513 0-46 247 
| 0-93 | 0:78 645 | 0:78 143 0-78 661 
0-95 0-94 612 | 0-95 047 0-94 612 
| 0-965 | 0-99 263 | 0-99 660 0-99 264 





This method of approximation is unlikely to give as good values from the y transformation as from 
the z, because the former has not secured so close an approach to normality as the latter. The y trans- 
formation may be useful as a quick check, however, for since the expectation of y is known exactly 
a number of the terms in the Edgeworth expansion is zero, and it is simple to compute. 


6. From the moments of § 2 it appears that 
&(sin-1r) = sin-!p 


up to terms of order n~*. The identity holds in fact for any order of n-. 
The probability distribution of r, the correlation coefficient, calculated from a sample of size n, 
randomly drawn from a normal! bivariate population with correlation coefficient p is known to be 


OE en SO 
m(n—3)! d(rp)”-*\ (1—r*p*)t 





P(r|n, p) = (11) 











ansion 
known 

sion to 

1). The 
3. 


is from 
| trans- 
»xactly 


size n, 
be 


(11) 
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Since it is a probability distribution, 
+1 d"-2 (cos-1(—rp) m(n—3)! 
1l— 2)3(n—4) = SOC i 
J _ 7 soe (1—r*p?)8 )ar i--r- om 
es bee d"-2 (cos-1(—rp) by fo-t ape ’ he j 1 b 
enoting arp \ (1 —ripayh y f"-*(rp), and rearranging the integral, we have 
ae 1 2\3(n—3) f n—2 
sat n— n— 
, acai ie gntrp) = Cnlp)- (12) 


Integrating by parts we have 
+1 
Ca(o) = [sin-*r(1 —r2) "9 f 2-2 rp) — i) sintr 5 (1 —8)K0-9.7"-4rp)} de. 
—1 


For n>3, [sin-!7r(1 —r?)K"-9) fn-2(7p)]+? = 0, since f"-*(rp) is finite and non-zero at the limits +1. 
Therefore 
+1 d +1 
C,(p) = -f sin! r(1 — r2)K"—9) 9 —__f2-2(rp) dr +(n— »{ r sin r(1 —r2)K"—-9) f2-2(rp) dr, 
=f d(rp) ai 
and hence 
+1 +1 9 
C,(p) = —p i) sin-1 (1 —r2)K"-® F=-1(rp) dr + (n— 3) £ (sine plr|in—1)sp)Cq-xl0)} dr. (13) 
=% -1 


Denoting the expectation of sin-! 7, when r is calculated from a sample of size n, by &,(sin-!r), we have 


a d 
Cr(P) = — PCns1(P) Ensi(sin- 7) + (n— 3) [ c.-10) ap é,-1(sin~ r) + &,_,(sin~ r) 70-0) | » (14) 


Substituting the values of the C’s and rearranging we have 
1 7) —2 

——_—— —— &,_,(sin-!r) = > 

(1—p*)t ap (1—p?*) 


If &,_,(sin-!7) = sin! p, since we have assumed n > 2, it follows that 





[€,-1(sin r) — &,,4,(sin~* r)). (15) 


6 n4\(sin-1r) = &,_,(sin-!r) = sin-! p. 


From (12) when n = 3 we have 


+1 
C;(p) = { ; (1—r*)-4f'(rp) dr 





aro 1 prcos-*(—rp)\]+1 Pye 
= [sin aaa (—rp)t - — pC,(p) é,{sin tr). 
ae... mp sin- p pu ae: 
one Ip 1p phe Tp 
and therefore é,(sin-!r) = sin-!p. (16) 


From equation (15) it follows that &,(sin-!r) = &,(sin-!r) = sin-'p and, in fact, &,,(sin-!7) = sin-1p 
for n = 2,3,.... It has not been possible to show by the same method that ¢?,(sin-!r) for odd sample 
sizes is equal to sin-1p. However, if we assume 


A 
&,(sin-} n= = sin-1p+ awh, slp) + 


for all values of n, where A,(p) for s = 4,5,... are functions of p, then substituting in equation (15) we 
have 














1 1 
0= Fan—2)| Aol aet aig 4 [aap ‘aa : 
Aue) Ae) 
(n—1)8* (n—-1)8 
where Ai(p) = 54 (p) fors= 4,5,.... 
Equating coefficients of n-? we find that A,(p) = 0. Similarly, we find that 0 = A;(p) = A,(p) = .... 


It seems, therefore, reasonable to assume that &,(sin-!r) = sin-!p for all values of n. 
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7. In his paper ‘New light on the correlation coefficient and its transforms’, H. Hotelling (1953, 
p- 221) states as a theorem: 

‘No function y(r), independent of p and n, satisfying the conditions for expansibility in a Taylor series 
through terms of fourth order, and termwise integrable to give the expectation in a series of powers of 
n-}, exists such that &(y(7r)) = W(p).’ 

We think, however, that this conclusion is due to a mistake, for if the substitution which Hotelling 
refers to in the penultimate line of his p. 220 is made, we find that the coefficient of n-* in his earlier 
expansion for &(y(7)) — (p) vanishes identically for all values of p. 


I wish to thank Dr F. N. David and Dr D. E. Barton for the helpful suggestions they made during the 
preparation of this paper. 
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Note on the moment-problem for unimodal distributions when 
one or both terminals are known 


By C. L. MALLOWS 
University College London 


1. Several mathematical models have been proposed in attempts to represent biological data relating, 
for example, to counts of species in randomly chosen regions. Anscombe (1950) gives a discussion, and 
compares eight different two-parameter distributions. He remarks that of these distributions, some 
have only one mode, some one or two, while some may have any number from one upwards. This 
progression is reflected in the behaviour of the third and fourth cumulants; the number of modes appears 
to increase with a decrease in both skewness and leptokurtosis. 

It is of interest to determine whether in fact this must be so; more specifically, to determine the 
‘unimodality boundary’ for such distributions, i.e. a boundary defining values of the lower moments 
such that any distribution having these moments must of necessity be multimodal. 


2. The necessary and sufficient conditions which the sequence of moments /Jo, /4;, ---, /4n Must satisfy 
in order that a (cumulative) distribution function ¢(7) having these moments may exist are known in 
the situations where x 


(a) may take any value between — oo and +0; 
(b) is restricted to positive values; 
(c) is restricted to lie in a finite interval (¢,,¢,) (see, for example, Shohat & Tamarkin (1943)). 


By means of a transformation Johnson & Rogers (1951) reduced the case (a), where the additional 
restriction is applied that the distribution is to be unimodal, to the case (a) without restriction. It is 
now shown that this transformation can also be applied to the cases (b) and (c) above. 

These results apply immediately only to continuous distributions; however, we can obtain results 
for the discrete case by considering the distribution of the sum of the discrete variable and an inde- 


pendent rectangular variable. This will have a ‘histogram’ distribution which may be regarded as 
continuous. 
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3. The conditions referred to in §2 are as follows (moments about zero throughout)*: 


Case (a): t, = — 00, tg = +00. The determinants 


AMM) = | Misslis-0 (7 = 0,1,..-,[42]) 
are positive. 


Case (b): t, = 0, t, = +o. The determinants 


A(u) (r=0,1,...,[4r]), AH) = | Missa lis-0 (7 = 0,1,..-,[4(m—1)]) 


are positive. 
Case (c): t, = —1, t, = +1. If nis even = 2m, the determinants 
AH), ArH) = | Miss Mirsre es (r = 0,1,...,m) 
are positive; ifn = 2m+1, the determinants 


Ar (H) = | Mess t+ Hisssa li,5-0 AP (“) = | Mins Misias li,s—0 (r = 0, 1,...,m) 
are positive. 


4. The transformation used by Johnson & Rogers in the case (a) is as follows. Suppose ¢”(2) exists 
for all x, and 


20 (x<f), 
¢’(x)+=0 (x=), 
<0 (a>). 


Then the distribution is unimodal, and has a mode at £. Note that we may have $”(x) vanishing over an 
interval including /, so that there is no unique mode. Any unimodal distribution (continuous, but whose 
second derivative need not exist; e.g. a ‘histogram’ distribution) can be approximated as closely as 
desired by such a ¢(z2). 

Consider the function (x) given by 


W(x) = (B—2)$"(w), Y(—c0) = 0. 


Then (x) is a distribution function with moments 


V, = fara) = (r+ 1) fp — Brptyas 


and so the v’s must satisfy the conditions of §3(a). We thus obtain limits for £ for the given moments 
Lo» [> -++5 fn» and may derive the required conditions on the moments themselves. 
The determinant A,(v) can be manipulated into the following form: 


1 Bi if sexe prt 
0 Po Uy -- (r+1)p, 


Fis or es | ces EDS | 
which will be denoted by A,(v) = | Bis (6 4+-9—V) Mias-2 l7Fe 0» 
5. The same transformation can be applied in the cases (b) and (c) above. We obtain the following 
results, where the moments are about zero throughout. 


Yor a unimodal solution of the moment problem to exist with a mode at , it is necessary and sufficient 
that: 


Case (b). Terminals t, = 0, t, = +00. The determinants 
Adv) (r=0,1,...,[4n]), Av) = | Bis) (¢4+9) Hits ae (r = 0,1,..., [4(n—1)]) 
are positive. 


* These results apply strictly only if the spectrum of the distribution function does not reduce to 
a finite set of points; in the contrary case the determinants will vanish for large r. 
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Case (c). Terminals ¢, = —1, tj = +1. If n = 2m, the determinants 
Av), Atv) =| B35 (6+9F—-V Mirse—(64I+ VD Mass lis-0 (7 = 0,1,...,m) 
are positive; ifn = 2m+1, the determinants 
Ar (v) = | Bs (E+5—M) Missa t EFI) Missa [Fo 


iv, ; ra ° . r+1 (7 = 0, 1,...,m) 
AY(v) = | BP; (@4+9 —1) Missa — (8 +9) Massa |B Feo 


are positive. 


6. (i) Thus in case (c) (terminals —1, +1), where the moments fy) = 1, fy, 4, are known, the first few 
conditions reduce to 


(B— py)? < 3(f,— wi) < 1— 3ynt + 2,8, 


and hence we must have (since —1</<1) 


(L—f,)(1+34,) (4,>9), 


0<3(u.— U2) < 
F< 114 9)(1-3e,) (<0). 


These are inequalities for the moments //,, /42 when the terminals are given as — 1, +1. We may write 
them also as inequalities for the terminals ¢,, ¢, when the moments are given as 4, = 0, 4, = 1; we find 


3+ 2t,ta+t<0 (t,+t,<0), 
34+ 2t,t.+t<0 (t,+¢,>0). 


(ii) In case (b) (lower terminal zero) the first few conditions are: 


0<24,-f, 0<3(u,—pi)—(2-)*, 
0 < 8/ty fy — 9g + 2A (3p a — 2g) + P( 3g — 4u13), 
0 < 15 pig fy — 165 + 48/1 ots — 203 ug — 273 
— B(L6p5 fog + 12poy fy — LO pty fg — 18 pty 13) 
+ B*(8pty fy + 9p — Spey — 12pi My) — P(12p Wg — 44 — 83). 


7. Suppose y is a random variable taking only the values 0,1,... and having moments (about zero) 
Noo 1» +++, and wu is an independent rectangular variable (uniform in (0, 1)). Then x = y+u has moments 


S ft\; Me 1. gullet 
“20h a(S) 
Pr ge0/IF1 r+ljer\ J Ms 
and x is unimodal only if y is. In this case we may take the mode £ of x to be at the mode of y. 
The inequalities of § 6 (ii) will now give conditions on the moments and mode of y; these conditions 
will not be the best possible, as in the deriviation of the inequalities of § 6 (ii) we have not used the additional 


restriction that 2 must have a ‘histogram’ distribution. The error will be small. We obtain the following 
inequalities: 


0<29,+1-f, 0<3(y.—yi)+4-(2-—m—- 4), 
0< 89,73 — 993 + 493 — 691. 2— 1 — B( 43 — 69, 9, + 32 — 6 — 91) + B(39_— 49} — 7). 


A general investigation of the implications of these inequalities is complicated by the fact that we 
cannot standardize the moments; ‘location’ and ‘scale’ are determined by the lower terminal (zero) 
and the interval (unity). 

Anscombe compared four distributions having 7, = 20, 7, = 620 (or 618 in case (iv)), namely: 
(i) negative binomial, (ii) Polya—Aeppli, (iii) Neyman Type A, (iv) Thomas. Of these, (i) has one mode, 
(ii) two, and (iii) and (iv) each three. The third and fourth moments can easily be obtained from the 
generating functions, and substituted in the inequalities above. We find that the moments of even the 
Thomas distribution could belong to a unimodal distribution, with mode f in the interval (0,27). Put 
another way, we cannot deduce from these moments alone that the distribution must be multimodal. 

Taking more moments will of course give closer limits for #; and if the distribution is in fact multimodal, 
then for a sufficiently large number of moments the inequalities will not be satisfied for any /. 
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On inverting a class of patterned matrices 


By 8. N. ROY anp A. E. SARHAN* 


Institute of Statistics, University of North Carolina 


1. Introduction. We shall use here bold capital letters for matrices and primed letters for their 
transposes, small letters for scalars, small bold letters for column vectors, small bold primed letters for 
row vectors. For example, a will stand for a column vector whose typical element is a; and (ab/c) for 
a column vector whose typical element is a,b,c. D,,(& xk) will stand for a diagonal matrix whose 
diagonal elements are (q,@2,...,@,) and M(kxk) will stand for a lower triangular matrix. If all 
elements of M are unity we shall write M(k x1) as J(k xl). 

















2. If 
a or 
Mapex k) =f 2h 9 OF (2-1) 
a;,b, a,b, a,b, eee a,b 
then it is easy to check that 
1/b,a, 0 er se 0 0 
—1/b,a, 1/ba, 0 ... da 0 0 
Mzio(k x k) = sev Sng ore es i (2-2) 
0 0 0 —1/bp1Gy—-g  1/byy Oxy 0 
0 0 - 1/b,.a,_, 1/b,a, 
3. If 
cia, C1 Cy Qy C1 C30, et C1 Ca Qy 
C1CgQ, — €3(@, +) C_C3(4y + Gg) C2C,(4, + aq) 
G=]ecya, Cyes(a,+ay) €3(4, +42 +45) CyCn(Q,+4,+43) > (3-1) 
Cyn A, Cg Cy(4y+Gq) CyCn (A, +4y +s) C2(Q, +, +... +ay) 
th gt. ft 1 
” a(=+2) iapnteats 0 0 = 0 0 
GQ \% As C1 Cg@, 
1 Lyi A 1 
. a ee ge 0 0 0 
CiCgGg C3 \Gq Gg CoC3 Az 
C1 = 1 Fi oe. 1 ste 
0 sito a(- *) aussie) +c 0 0 
ColCgQ, C2 \dg Oy C3 CqQy 
1 1 
0 0 0 0 — 


Ca1lnIn Cady 
To prove this, we put C = M..(n xn) Da(n x n) M., (nxn), invert both sides and use (2-1) and (2-2). 


* This research was supported, in part, by the United States Air Force through the Office of 
Scientific Research of the Air Research and Development Command. 
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Example 1. We need to invert for various purposes the following symmetric positive-definite matrix 
which is the variance-covariance matrix of order statistics for a sample of size n from an exponential 
population (Sarhan, 1954): 














1 1 1 1 
n*? n? n® Ww) a 
1 ] 1 1 1 1 1 
n? n* (n—1)? n* (n—1)? “m2 (n—1)? 
Vinxn)=F yj 1 1 3 l 3 1 : (3°3) 
i a oe oe ee Pe 
n? mn* (n—1)? j214(n—-i4+1)? i-1(n—i+1)? 
ae 1 3 1 n 1 
ea a ee a 
n= n® (n—1)? ja1(n—744+1)? i-1 (n—t4+1)? 
This can be put in the form (3-1) by putting 
CG, =Cg=...=c¢,=1 
and a : : - 1 3-4 
Cc =—-, @4=>——-, & = —-—; anil. ° 
; Ny . (n—1)? is (n—2)? . on 
Therefore, by substituting in (3-2), we have 
n? + (n—1)* —(n—1)? 0 0 
=citey 3} = |" —2)2 —(n—2)2 
—— (n—1) (n—1)?+(n—2) (n—2) 0 (3°5) 
0 0 0 cn ee SO 
This inverse can also be obtained by induction. 
4. If 
C(k x k) = [D,(k x k) +Aq(k x 1) r’(1 x k)], (4-1) 
A a 
then C+ = Dy,— ~—_, —— (ap) (rp. (4-2) 
(i+a D> andps) 
i=1 
Proof. Assume that 
C+ = D,),,(k x k) +4(q/p) (r/p)’, (4:3) 


where y# is unknown. Therefore 
A(k x k) = CG™ = (D, +Aqr’) (Dy), + 4(4/P) (¥/P)’) 


k 
= I+ypaq(r/p)’ +Aq(r/p)’ + Ape (q:7:/p:) a(t/P)’ 
‘= 


k 
=I+ (n+ +Ap X arp) q(r/p)’. 


i=1 
k 
Thus (+r >> ards) +2 =0 (4-4) 
i=1 
k 
or p=-al(i+ad airilPs), 
i=1 


which completes the proof of (4-2). 


Example 2. We need to invert the following symmetric positive-definite matrix which is the variance- 
covariance matrix of the multinomial distribution: 


Pi(1—p,) —PiP2 —PiPs +e —PiP 
—PiP2 Po 1—pe) —P2Ps3 tee —P2Pr 
Vikxk)=] —pip, —pep, p(l—ps) -. —psPpr | (4-5-1) 
-P: Pr —P2Px —PsPr +» =—DpPr(1—px) 











trix 
ntial 


(3*3) 


(3-4) 


(3-5) 


(4-1) 


(4-2) 


(4-3) 


(4-4) 


ce- 


5+) 
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This can be put in the form 
D,(k x k)+Ap(k x 1) p’(1 xk), 
where A = —1, that is, 
Pr 0 Pi 
Pe ° 
v= - ae (P1--- Px). 
0 P P ke 
Therefore, from (4-2) we have 
1 
V = Dy, + a J 
i p> Pi 
i=1 
1 1 1 1 
“th k i k 
1l-ipn, 1-Z2y 1-2 p; 
i=1 i=1 i=1 
1 1 1 1 
ak igh ou) Males fs | 
= 1—-X p, 1-2 p, 1— > », : (4-5-2) 
i=1 i=1 i=1 
1 1 1 1 
ans ae ae eee Pr Kk 
1-2 p; 1— 2 p; 1-2 p, 
i=1 i=1 i=1 
5. If 
D, aJ n 
C{(n +p) x (n+p)} = , (5-1) 
aJ D,,/p 
1 
then c= ' . (5-2) 
BIJ —I+yJ ] p 
m 
n Pp 
a*p a a*n 
h = — ——__——__, = —— yo - -——___—__, 5: 
— k(na*p — mk) na*p — mk paella mina2p — mk) ai 


Proof. Assuming C-! to be of the general structure (5-2) and equating CC~' to I{(n+>p) x (n+p)} 
we have 


1 
zDe+ akS+papS pkJ- < J +aypS \n 
F 


[= 
1 
BkJ +—J t+aypS) —D,,+ymJ+afnJ } p 
m m 
n p 
or ak+pap = 0, p+ —+ayp =0, ym+anf = 0, (5-4) 
v 


from which, by solving for a, § and y, we have (5-3) and thus (5-2). 


Example 3. We need to invert the following non-singular matrix which occurs in analysis of variance 
for two-way classification with equal frequencies in the different cells 


D, J\m 

C= . (5:5) 
J OD,/k-1 
m k-l 
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This is easily seen to be a special case of (5-1) by putting a = 1,n = m, p = k—1. Now substituting in 
(5-3) and (5-2) we have 


1 k-1 1 
i = i 
= 1 bcm ait 
-——J —1+—J/k-1 
n mm 
m k-1 
6. If D, J J\k 
C{(3k—2)x(3k-2)}=| I Dy J pk, (6-1) 
I J D,Jk-1 
k k-1 k-1 
then D,,+aJ BI yJ k 
Ci=( ~J DypteI gS |k-1, (6-2) 
yJ oJ D,,.+é3/ k-1 
k k-1 k-1 
where @=2G-VYiE, fH=y=—- ik, 6=E=—VE end oO = 0. (6-3) 


Proof. Assuming C-! to be of the general structure (6-2) and equating CC-! to I{(3k — 2) x (3k—2)}, 
we have, as in the case of (6-2), six linear equations in the six unknowns @, /, y, 6, $, €, from which, by 
solving, we have the unknowns in the form (6-3). 

The matrices C and C-! of (6-1) and (6-2) occur in analysis of variance for Latin squares. 


7 


ork 8 6 .. BB 
Ye @€é@ €@ 3 @ 
Ck xk) =] b d .. al, (7-1) 
bdd. c 
then eg F Aiwrigs 
, 8 €\4. & 
Calf hgh... hp (7-2) 
f hhh g 
l 1 
where c= alt +Ab>(k-1)], f=—Ab, g= eer —A(ad—6*)], h = A(b?—ad)/(c—d), (7-3) 
Cc— 
and A = 1/[a(e—d) + (k—1) (ad —6?)). (7-4) 
Proof. The right-hand side of (7-1) can be written in the form 
a bi’ 1 
’ (7-5) 
(* eae k-1 
1 k-1 
We now assume that the right-hand side of (7-2) can be written in the form 
e 1’ 1 
, ) , (7-6) 
ft D,-»+hJI] k-1 
1 k-1 


where e, f, g, h are undetermined, and 1’ = (1, 1,..., 1). 
Multiplying (7-5) with (7-6) and equating to I(k x k), we have 


ae +bf(k—1) af 1’ +b(g—h) 1’ +bh(k—1)1’ 
(" " 1 = [af1+b(g—h)1+ba(k—1)1  bfI+(c—d)(g—h) I +h(c—k) J}. (7-7) 
10 I/k-1 +d(g—h) J+dh(k-1)I 











ig in 


5-6) 


6-1) 


71) 


7*2) 


7-3) 


7-4) 


7:5) 


7-6) 
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Thus we should have 
aet+bf(k—1)=1, af+b(g—h)+bh(k—1)=0, (c—d)(g—h) = 1, 
bf +h(c—d) +d(g—h)+dh(k—1) = 0, (7-8) 


solving which we obtain (7-3). 
The matrix (7-1) is common in response surfaces. 5, 6 and 7 can also be subsumed under 4. 


The authors would like to thank Prof. M. G. Kendall for his valuable editorial suggestions. 
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A note on the risks of error involved in the sequential ratio test 


By J. MEDHI 
Institut de Statistique, University of Paris 


1. INTRODUCTION 


For testing hypotheses, the sequential procedure, formulated by Wald, at first involves the successive 
calculation of 
Zy 2p Deg Hy t$2%q,...-Z, = Mt... +2—, 





where 2; = 2(x,;) = log 


the densities f,(x), f,(x) (the probabilities pp», p, in case of discrete distribution) of x corresponding to the 
two alternative hypotheses Hy, H, respectively. Then, if at any stage Z>a, H, is accepted; if Z<6, 
H, is accepted and in case b< Z <a, a new independent sample* is to be taken. Let the chance of error 
in accepting H, when H, is true be a and that in accepting Hy when H, is true be £. Then we have 
approximately a = log(1—a)—logf and b = loga—log(1—f). 

That the sequential analysis theory is also applicable to a single sample was first pointed out by 
G. A. Barnard. As a solution to the important problem of discrimination between two types of spectra— 
continuous and discrete-—of time series, a test procedure based on such an application has been suggested 
by Bartlett (1954). Here the criterion is the log likelihood ratio, log (p 9/p,), where the probabilities 
Po P; for the sarnple for each of the hypotheses, H, (discrete spectrum), H, (continuous spectrum) have 
to be maximized with respect to the unknown parameters involved in the specifications. We regard the 
entire available sample of n observations as the first of a hypothetical sequence of independent samples. 
Then if we take the risks of error in accepting H, when H, is true and vice versa to be equal, being at most 
not greater than €, we may, following sequential theory, adopt the decision rule: 


1 ai 
(i) if log (p9/p,) > log —, then accept Hy, 
(ii) if log ( p/p.) < log —, then accept H,, 


i 
(iii) if log—*> log ( Po/P1) > log —, then reach no decision (see also Medhi (1956)). 


Now when the probability of a decision with the single sample available is not small, such a procedure 
might lead to a rather over-cautious assessment of ma “num risks ¢. It is therefore advisable to investi- 
gate the more precise risks of error in such circumstances; in ordinary sequential analysis we know that 
the usual procedure gives a good approximation in most cases. In this note an explicit example has been 
considered to obtain such risks of error in the circumstances stated. 


* The symbol x may denote a group of observations, i.e. a ‘sample’. 
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2. MEAN OF A NORMAL SAMPLE 

1 (a—m)? at 
ae -— m 
J(2m0%) °*P 2o? Ms 


and let the two alternative hypotheses be 


Let jseQo= 








Hy: mean = +m, H,: mean =—m. 
(x;) 2m 
Then 2, = 2(x,) = log =m 


We suppose that the hypothesis H, obtains. 
Let p{?(p) denote the probability of arriving at the right (wrong) decision at the ith step, 


pin = p+ eh +p, pm=pO+...4p™. 


Now g(z), the frequency function of z, is given by 


2 
g(z) = exp{ -"oi (o, = 2m/c). 


2 
20% 


1 
(2703) 


The frequency function g,(Z,) of Z, = z, is of exactly the same form as g(z), so that we have 
oo i) 
Py = | 91(Z,) dZ, = | g(z) dz, 
a a 


b b 
and og” = | g,(Z,)dZ, = | g(z) dz. 
—o —«o 


(2-1) 


(2-2) 


(2-3) 


(2-4) 


(2-5) 


9n(Z,), the conditional distribution of Z, = z,+...+2,, is given for n>1 by the ‘truncated convolu- 


tion’ (Samuelson, 1948), 
In(Zn) = +, WZn = 8) In-1(8) ds 
We have here 


a 
92(Z2) = I, 9(Z.—8)9,(s)ds (—0<Z,<o) 


PELE shade! 4(Z_— 04)? 1 V2 4{/2 f 
~ 4/(2707) on 202 ‘2 [= (32,— »| —F (2 az.) | , 


xz 
where F(x) = | S(t) dt. 
—2 
ao 
Hence 2 = | J2x(Zz) IZ, 
a 
b b, 1 
and p> = | 92(Z.)dZ, = | en exp (— 10] {F(t+c’)—F(t+c)} dt, 
— —o \V(27) 
where b, = (b—04)/(/20,), ¢= (oi —2a)/(,/20,), ¢” = (of —2b)/(,/20,). 
a 
Again 93(Z3) =| WZ — 8) gals) ds 
b 


l ” \(Z, — 30%)? 
(2/2707) Jp 207 


ar ia al le ea 
expt — ——_= x | F{—(a—4s)} — F{— (b-— ds, 
I 20? Co; w5 Co; Ow . 


so that p? = = [. 93(Z3) dZs, 


b 
and ge = a 93(Z3) dZ, = P 1, ae — K, K,K,dsdt, 
om oN 2a, 


(2-6) 


(2-7) 


(2-8) 


(2-9) 


(2-10) 


(211) 


(2-12) 











(2-1) 


(2-2) 





where 





(2-3) 


(2-4) 
(2-5) 


ivolu- 


(2-6) 


(2-7) 





(2-8) 


(2-9) 


(2-10) 
(2-11) 


2-12) 


K 
Proceeding precisely in the same way, we can calculate g,(Z,) and then p®, p® and so on 


233 





Miscellanea 
a a 
Fon exP(—-4), K exp es . : 
Biles sha 585 207% : 
bs = /36/(30,) — 3/30. 


~~ a 7) 
F{J2(a— }s)/04} — F{/2(b— 4s)/04}, 























2:30 
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* 4:90 
o 
$ s 
z 
id. 
1:70 
150 
— logyg (0-05) = 1-30 —E————E—EEE 
0 0-20 0-40 ». * 0-80 1-00 
> 
Y 


Fig. 1 


3. NUMERICAL ILLUSTRATION 


We take € = 0-05 and choose @, such that p is fairly high. 


(i) Let p® = 0-7995, then o, = 3-4080. 
© = 0-00077, 


p? = 0-00511, pO = 
p®} = 0-00598. 


p® = 0-00010 


Calculations give 


so that 
A rough value of p'*! obtained graphically is 0-006 


(ii) Let p® = 0-6206, then o, = 2-7006. 
p® = 000246, p® = 0-00063, 


p® = 0-00733, 
-50 and o, = ./(2a) = 2-4267. In 


@) 


p® = 90-0011, 


We have 
and thus p®! = 0-01042 and graphical evaluation gives p!%! = 0-011 roughly. 
(iii) Let 7, be such that is maximum, which implies that p! 


= 00075, p® = 0-0036, 


this case 
giving p®! = 0-0122, and di obtained graphically is 0-014 roughly. 
In each of the three cases, p® and p® were evaluated by numerical methods. We considered the 
intervals (d,, b,) instead of (— 00, b,) in case of p® and (ds, bs) instead of (— 00, b,) in the case of p®; da, ds 
were chosen such that the corresponding integrands become negligibly small for values of the ongamants 


less than d, in the former case and for values less than d, in the latter cas¢ 
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We find that when the probability of the right decision at the first step is 0-7995, 0-6206, 0-50, the 
total risks of error amount to roughly 0-006, 0-011 and 0-014 respectively, as against the maximum 
risk € = 0-05. 


The numerical results have been graphically illustrated by drawing the graphs — log p'*) against po 
as abscissa (for 7 = 1, 2, 3, 00). 


My sincere thanks are due to Prof. M. S. Bartlett for suggesting to me this problem. 
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CORRIGENDA 


(1) Gunnar Biom. ‘Transformations of the binomial, negative binomial, Poisson 
and y? distributions’, Biometrika (1954), 41, 302. 
I am indebted to Mr A. R. Thatcher Sor drawing my attention to a computational slip 
which has led to three errors in Table 1 on p. 310 of my paper which appeared in Biometrika 
(1954). In the last column of the table, lower confidence limits are given, being roots of 


the equation in p a—4—np =A,,/(npq). (7-5) 


The entries for a = 5 should be 0-051, 0-038 and 0-030 instead of 0-002, 0-001 and <0 
respectively. As a consequence, the remark on the same page that (7-5) is ‘remarkably 
inaccurate for a = 5’ is not justified. 

The correction does not affect the recommendation on p. 310 that the inverse sine 
formula should be used instead of (7-5). It is seen from the corrected Table 1 and has also 
been confirmed by extended calculations that the inverse sine formula, even without the 
skewness correction given in formula (7-3), provides more accurate values for confidence 
levels of 90% and more. For low confidence levels (7-5) is better, the difference between 
the formulae being, however, small. These facts, together with the computational simplicity 
of the inverse sine formula, lead to the recommendation mentioned above. 

In this connexion, it should be mentioned that the extended calculations have proved the 
tentative statement on p. 310 that formula (7-4) is preferable to (7-5) to be incorrect for small 
sample sizes. If, for example, n = 20 (7-5) is, in fact, more accurate. In view of the advantages 
of the inverse sine formula referred to above, this comparison is, however, of little interest. 

It might be added that C. A. Bennett & N. L. Franklin, on p. 606 of their book 
Statistical Analysis in Chemistry and the Chemical Industry (New York: Wiley; London: 
Chapman Hall, 1954) give a quadratic expression due to Freeman & Tukey, which also 
provides confidence limits for a binomial probability. The roots of the quadratic can, with 
the notation used in my paper, be written 


p= prsod. (EE) +7 2 t-P —p*), 


_ 
where Cc, = /0-aen 5): 


This formula bears a certain resemblance to the Cornish-Fisher expansion (7:2) on 
p. 309 of my paper, but it is much more accurate than (7-2) for small sample sizes, even 
when the skewness correction of the last-mentioned formula is used. The performance of 
the Bennett & Franklin formula is about the same as that of the inverse sine formula 
(without skewness correction), the last formula being, however, in general slightly better. 
Therefore, it seems unwarranted to use the method described by Bennett & Franklin, 
except when it is considered to be an advantage to avoid using a table of the inverse sine 
function. 








(2) M. B. Wik, Biometrika (1955), 42, 70. 
P. 74, Table 2, second line: 


for 








"P_ + (bt)? read IP +e 
(¢—1) & ix - zi 
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REVIEWS 


Choice and Chance by Cardpack and Chessboard. Volume 1. By Lancetot Hoesen. 
London: Max Parrish and Co. Ltd. 1955. Pp. 466. 70s. 


This is the continuation of the first volume which appeared in 1950. The sub-title to this, as to the first, 
is ‘An Introduction to Probability by Visual Aids’, but this, in the reviewer’s opinion, is not an 
accurate description. What Prof. Hogben has done is to write two text-books about statistical methods, 
illustrating with diagrams at intervals the various facets of the finite fundamental sets from which 
probability is calculated. As the statistica} methods become more complicated the diagrams become 
more infrequent, and in this second volume Prof. Hogben has to come down to algebra like everyone 
else, and begins with a chapter on expectation techniques. This is followed by bivariate models, 
simple analysis of variance, moments, sampling distributions, significance tests, regression, covariance 
and factor analysis, and an awkward chapter on sampling in a finite universe. The book is completed 
by a chapter entitled ‘Second Thoughts on Significance’, which reveals a certain naivety of outlook on 
the part of the author. 

It is certain that these two books will be of little use to the research worker in other fields who 
wishes to learn a little statistical method. In spite of the attractively produced diagrams the mathe- 
matical approach is there and must be mastered, and this mathematical approach is not rendered less 
formidable than that of any other statistics text-book by the author’s delineation of it. Serious 
students of mathematical statistics will undoubtedly prefer something more definite, such as, for 
example, Prof. Kendall’s two volumes. They will find in Prof. Hogben’s two volumes, however, a 


certain freshness of exposition which will possibly help them when they come to revise what they 
already know. F. N. DAVID 


Decision Processes. Edited by R. M. Trai, C. H. Coomss and R. L. Davis. New 


York: John Wiley and Sons, Inc.; London: Chapman and Hall Ltd. 1954. 
Pp. viii+ 332. 40s. 


This book contains edited versions of some of the papers delivered at a seminar on the ‘Design of 
Experiments in Decision Processes’ held in the summer of 1952 in Santa Monica, California. The 
authors ‘regard it as a vehicle for raising a number of basic questions and perhaps also providing some 
guideposts towards answers to some of these questions’. 

Perhaps the most important thing for a reviewer in this journal to do is to warn readers that the 
book is concerned only tangentially with the theory of statistical decision functions in the sense of 
Wald, and in fact, because of the relative weight of psychological and economic interest in most of the 
papers, few of them would qualify, by their subject-matter, for publication in Biometrika. This is not 
to say, however, that readers of Biometrika, whose broader interests lie in the directions indicated, 
will not find the papers worth reading. The terse style and self-contained nature of most of the chapters 
make them well suited to reading at odd moments. 

After two introductory chapters, the second devoted to the theory of scales of measurement, there 
follow four parts devoted, respectively, to ‘Individual and Social Choice’, ‘Learning Theory’, ‘Theory 
and Applications of Utility’ and ‘Experimental Studies’. One of the most interesting papers in the 
first section discusses the validity of four possible principles for choosing between alternative courses 
of action, in the light of incomplete information: 

(1) Wald’s minimax rule, or Manicheeism, which amounts to the belief that this world is the worst 
of al} possible. 

(2) A rule suggested by Hurwicz, which is a weighted combination of Manicheeism with the Leib- 
nizian notion that this world is the best of all possible. 

(3) Savage’s minimax regret rule, in which we assume, not that the world is the worst of all possible, 
but that the consequences of our ignorance are the worst of all possible. 

(4) Laplace’s rule, in which we attach the same a priori probability to each possible world. 

The author of this paper shows that Laplace’s rule is perhaps the least unsatisfactory of these four 


rules, and goes on to consider further possible principles in accordance with certain conditions which 
he lays down. 
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A good many of the papers discuss in a highly critical spirit the notion of expected utility in connexion 
with decisions, and after reading these papers anyone who believed that rational decisions were those 
which maximized expected utility, in any other than a purely tautvivgical sense, would have his belief 
severely shaken. The section devoted to Learning Theory carries on the development of this most 
interesting application of the theory of Conditional Probabilities which is, in the minds of most statisti- 
cians, associated with the name of Frederick Mosteller. The experimental studies represent perhaps the 
weakest part of the book, in that the problems considered are extremely trivial. It is remarkable that 
no use seems to be made of the very rich sources of experimental material of this kind which exist in 
the records of human activity which we call history, in a broad sense. But this may be being unfair to 
the authors, because it seems that most, if not all, the participants in this seminar recognize that 
their mathematical theories of decision processes must as yet be regarded as very remote from applica- 


tion to real human behaviour. G. A. BARNARD 


Stochastic Models for Learning. By R. R. Bus and F. MostetterR. New York: 
John Wiley and Sons, Inc. ; London: Chapman and Hall Ltd. 1955. Pp. xvi+ 365. 72s, 


With the usual type of learning experiment a subject has to make one of a number of alternative 
responses at each of a sequence of trials. Some of these responses are rewarded while others are punished, 
and learning is said to occur if the rewarded responses are made with increasing frequency as the trials 
proceed. In this book, Bush and Mosteller have attempted to describe such learning behaviour by 
means of probabliity models. A suitable model would appear to require that the probability of a 
particular response occurring at a given trial is some function of all the preceding responses; further, 
that the mean probability for the occurrence of the rewarded responses should increase, while that of 
the punished responses decrease, with successive trials. In order to describe what is essentially a 
non-stationary discrete time series, the authors have constructed their model as a kind of Markov 
process, where the state of the system at each trial is defined in terms of a vector, giving the probabilities 
for the occurrence of each type of response. Theories of learning are not sufficiently explicit to dictate 
the form which such a process should take, and consequently the authors have merely sought to 
describe their data in the most economical manner. This is in constrast with the physicist and the 
geneticist, who are in the enviable position of being able to construct stochastic processes which are 
intimately related to extensive and precisely formulated theories. 

The book is divided into two main parts. In the first, a general stochastic process is developed in 
terms of a system of linear operators, together with the derivation of a number of its mathematical 
properties. In particular, recurrence formulae are given for the moments of the various response 
probabilities at the nth trial. The second part deals with the application of special cases of this general 
process to data obtained from a number of learning experiments performed with animal and human 
subjects. It is in this latter half of the book that the authors deal with the problems of estimating the 
parameter values and of assessing the goodness of fit of their models. These problems appear to be of 
great difficulty and the various attempts which Bush and Mosteller have made to overcome them, while 
admittedly of an ad hoc nature, are of value as possibly encouraging further work in a relatively 


unexplored field of statistical theory. A. R. JONCKHEERE 


Introduction to Demography. By Mortimer SpreceLMAN. Chicago, Illinois: Society 
of Actuaries. 1955. Pp. xxi+309. $6.00. 


This book was commissioned by the Society of Actuaries for the use of students preparing themselves 
for the Society’s examinations. It is thus a parallel work to that of P. R. Cox (Demography, Cambridge 
University Press, 1950). While Cox is mainly concerned with the collection and analysis of British vital 
statistics, Spiegelman naturally places more emphasis on data arising in the United States and Canada. 
There is, for example, a more detailed and comprehensive study of errors occurring in censuses, and 
other vital statistics, than is given by Cox, since such errors have persisted longer, and to a greater 
degree, in the American censuses than in those of Great Britain. 

The book should be admirably suited to the needs of actuarial students in that the items of necessary 
information are set out clearly in (usually) short sections. On the other hand, there tends to be the 
typical text-book drawback of a lack of deep critical discussion. 

Subsequent to a discussion of demographic data and their shortcomings there is a chapter on 
measures of mortality, and comparison of mortality in different populations. The author’s comparison 
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of ‘direct’ and ‘indirect’ adjusted rates seems to be less than fair to the latter. A conventional treat- 
ment of life-table construction and mortality projections rounds off the treatment of mortality. 

There follows a brief chapter, too brief in the opinion of the reviewer, on morbidity statistics and a 
treatment of family composition and the various indices of fertility and reproduction. The next chapter 
deals with the distribution of the population, and internal and external migration. Internal migration 
is considered rather cursorily and recent British work on the balance of internal migration is not 
mentioned. The following chapter, on ‘The Working Population’, gives an interesting and clear 
analysis of the structure of the working population of the United States, based on published material. 
The final chapter, on population estimates and projections, gives a fair statement of the problems and 
methods of projection, without, perhaps, sufficient cautionary advice on the value of these projections 
when calculated. 

To sum up, this is a good text-book of real value to students, though it breaks little significantly 
fresh ground in the demographic field. N. L. JOHNSON 


Mortality and Other Investigations. Volume1. By H. W. Haycocks and W. Perks. 
Cambridge University Press. 1955. Pp. ix+164. 20s. 


This book is one of a series written for the post-war syllabus of the Institute of Actuaries and forms the 
first introduction to the more practical subjects included in the actuarial syllabus. The subjects 
covered are the compilation of mortality and sickness rates, some discussion of the National Life Tables 
and elementary graduation such as the graphic and finite difference methods including, however, 
osculatory interpolation. More advanced topics such as selection, continuous exposed te risk formulae 
and multiple decrement tables are to be dealt with in a second volume. The bulk of the book is devoted 
to the calculation of the o::'» «ry rate of mortality from life office data using the policy year, life year 
and calendar year methods. ihe treatment is both meticulous and exhaustive. The underlying notion 
used is that of risk time and the various formulae are built up by combining the appropriate portions of 
risk-time. This part of the book seems, however, to suffer from a complete lack of illustrative examples, 
which would probably drive th. points home to the average student much more quickly. There is also 
a missed opportunity here in that the straightforward application of such methods to absenteeism 
rates, the repair rate of machines, breakdown of vehicles and so on is nowhere mentioned. 

The later part of the book deals mainly with graduation. After the graphic method has been 
described some appropriate statistical tests for the observed data are given. For the x? test the statement 
is made, p. 127, that ‘the expected y?-total is, of course, the number of individual values of x*’. This 
remark seems, to a statistician, rather surprising unless arbitrary mortality rates are being used, which 
is not so in the case considered where they have been obtained by a graduation. The authors appear 
to realize that all is not well as they then go on to describe circumstances under which the degrees of 
freedom might be reduced by one. 

As a whole the book suffers from the limitations imposed on it by being written for a specific syllabus. 
A more logical plan would require all the exposed to risk, including selection, to be grouped together. 
However, to an actuarial student the book will be a gold mine provided he concentrates on it carefully 
so as to extract the essence of the various procedures. At the same time it is a pity that the book was 
not made to appeal to a wider audience by the inclusion of a few non-actuarial examples. For this 
reason a more appropriate title to the book would be ‘Mortality and Allied Investigations’. 


P. G. MOORE 


Numerical Methods. By Anprew D. Bootu. London: Butterworth’s Scientific 
Publications. 1955. Pp. vii+195. 35s. 


The development of automatic digital computers has in recent years given remarkable impetus to the 
study of numerical analysis. This book provides a short, lucid survey of an extensive and rapidly 
growing field. Treated are: interpolation, numerical differentiation and integration, summation of 
series, ordinary and partial differential equations, simultaneous linear equations, non-linear algebraic 
equations, approximating functions, Fourier synthesis and analysis, integral equations. Monte Carlo 
methods for linear, differential and integral equations are briefly mentioned. 

Orientation towards the modern machines is explicit, and an important feature is the revaluation 
of the older methods of hand calculation in the light of experience of automatic computation. The 
treatment is everywhere extremely brief. Even the largest chapter (38 pages) on simultaneous linear 
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equations provides a scanty treatment, with little special mention, for example, of simplified procedures 
for symmetric matrices. There is some account of the condition of matrices, but little general discussion 
of errors, more than ever important in lengthy machine computations. No exercises are provided. 

Numerical analysis has not yet become a standard subject in undergraduate mathematical courses. 
It is therefore interesting to note that the present book is based on a course of lectures given by the 
author to final honours students at Birkbeck College. It should be helpful for teaching the basic 
principles ; those who have a problem to solve and require a fuller treatment of some particular topic 


will find useful key references included in the text. ¥. G. FOSTER 


Numerical Analysis. By Z. Kopat. London: Chapman and Hall Ltd. 1955. 
Pp. xiv+ 556. 63s. 


This book covers many aspects of numerical analysis and in particular such topics as interpolation, 
numerical differentiation and quadrature, differential equations and boundary value problems. The 
style of writing is clear and easy to follow yet no vigour is lost. The standard of mathematics assumed 
corresponds roughly with pure mathematics in the Advanced Level of the General Certificate of 
Education, and new concepts are fully explained as and when they occur. The author does not 
hesitate to use numerical examples to illustrate and drive home his various points if he feels that it is 
better than a mass of symbols by themselves, and this enhances the value of a book designed to illustrate 
numerical methods. 

The proof of interpolation formulae utilizes algebraic methods following from the Lagrangian form 
of formulae where n+ 1 points of a function are used to fix a polynomial of order n. Alternative deri- 
vations of the main formulae using the quick and neat operational form are given in an appendix. 
At the end of each chapter there are some brief bibliographical notes which should enable anyone 
desiring to pursue the subject further to get a good start. There are also plenty of examples for the 
reader to work out, varying enormously in difficulty—some being straightforward applications of 
bookwork whilst others are more or less research problems. Answers are not given, which is a pity in 
the case of the more purely numerical problems. 

Naturally any book of this nature has to be terminated at some point, and in this case the emphasis is 
on single variable techniques only. Thus in the very wide coverage given to differential equations the 
absence of any discussion, however brief, on partial equations is to be regretted. Perhaps also to 
complete the book a discussion of the inversion of matrices and an appendix with some tables of 
Besselian and Everett’s interpolation coefficients might have been included. These, though, are minor 
criticisms of what is a very thorough and well-written text-book which should be of use to numerous 


applied scientists. P. G. MOORE 


Lectures on Functions of a Complex Variable. Edited by Witrrep Kaptan. 
Michigan: University of Michigan Press. 1955. Pp. v+433. $10. 


This book is a symposium of thirty-one lectures on recent developments in complex variable theory— 
these having been given at the Conference held for this purpose at the University of Michigan in the 
summer of 1953. The lectures vary, from introductory outlines of different parts of the subject, to 
specialized contributions which are also to be published elsewhere in the appropriate pure- and applied- 
mathematical journals. A considerable proportion of the contributions are on ‘topological analysis’ 
and conformal mapping, but a variety of topics not coming under these heads are also dealt with. It 
is thus a book for the analyst who specializes in certain parts of his subject, and the statistical relevance 
is very indirect. Thus the contributions on ‘The Distribution of Zeros of a Polynomial’, ‘Approximation 
by Polynomials’, ‘Expansion Theorems for Analytic Functions’, etc., may well be of assistance in 
research into the mathematical functions of statistics but they could hardly. be called statistically 
important. 

More to the point perhaps, are the surprisingly large number of the papers which deal (in their 
applications at least) with harmonic and other potential functions. For in view of the mathematical 
equivalence of distributions of mass, charge, etc., with those of probability, it is remarkable that the 
vast corpus of potential theory has not borne more statistical fruit. 

The book, however, lacks an index and it is very doubtful whether the publication of such a hetero- 
geneous collection of essays (in quality and content) serves any useful purpose not already catered for 


by existing periodicals and journals. The price is prohibitive. D. E. BARTON 
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Department of Statistics, University College, London 


I, Tables of the Digamma and Trigamma Functions. By ELEANOR PAIRMAN, M.A, 


. 2 1 
S= ; ; 
Tables for summing p> (pi +43) (Dal + 9s) «-- (Pai +9n) 





where the p’s and q’s are numerical 
factors. Price 5s. net. 
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LOGARITHMETICA BRITANNICA 


A standard Table of Logarithms to Twenty Decimal Places. By A. J. THOMPSON, Ph.D. 
(commenced in 1922 to commemorate the tercentenary of the publication of HENRY BRIGGs’S 
Arithmetica Logarithmica). 

The nine separate sections of this Table have now been issued, and the complete work 

consisting of the logarithms of numbers 10,000-100,000, together with Dr Thompson’s 

General Introduction (98 pp.) is now available in two bound volumes. 


Price £8. 8s. od. 
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